gringer / bioinfscripts

Bioinformatics scripts produced over the course of my work. Now maintained on GitLab.
https://gitlab.com/gringer/bioinfscripts
GNU General Public License v3.0
66 stars 14 forks source link

Data format enhancement #2

Closed noncodo closed 7 years ago

noncodo commented 7 years ago

A nice addition to fast5extractor.py would be to check what data is available in the fast5 (e.g., raw, basecall, etc)

gringer commented 7 years ago

Can you explain a bit more about why that would be useful? I'm not sure how that would improve on what's already available using h5ls. Most of the stuff that is relevant for data analysis (e.g. raw sequence length, template/complement fastq length) will eventually be going into the telemetry function:

$ h5ls -r COLLES_L160691_20160605_FNFAD11809_MN17534_sequencing_run_Zika_Library2_12plex_72015_ch388_read406_strand.fast5 | perl -pe 's/\s+Group$//'
/
/Analyses
/Analyses/Barcoding_000
/Analyses/Barcoding_000/Barcoding
/Analyses/Barcoding_000/Barcoding/Aligns Dataset {756}
/Analyses/Barcoding_000/Barcoding/Fastq Dataset {SCALAR}
/Analyses/Barcoding_000/Configuration
/Analyses/Barcoding_000/Configuration/aggregator
/Analyses/Barcoding_000/Configuration/barcoding
/Analyses/Barcoding_000/Configuration/basecall_1d
/Analyses/Barcoding_000/Configuration/basecall_2d
/Analyses/Barcoding_000/Configuration/calibration_strand
/Analyses/Barcoding_000/Configuration/components
/Analyses/Barcoding_000/Configuration/general
/Analyses/Barcoding_000/Configuration/hairpin_align
/Analyses/Barcoding_000/Configuration/post_processing.3000Hz
/Analyses/Barcoding_000/Configuration/split_hairpin
/Analyses/Barcoding_000/Log Dataset {SCALAR}
/Analyses/Barcoding_000/Summary
/Analyses/Barcoding_000/Summary/barcoding
/Analyses/Basecall_1D_000
/Analyses/Basecall_1D_000/BaseCalled_complement
/Analyses/Basecall_1D_000/BaseCalled_complement/Events Dataset {602}
/Analyses/Basecall_1D_000/BaseCalled_complement/Fastq Dataset {SCALAR}
/Analyses/Basecall_1D_000/BaseCalled_template
/Analyses/Basecall_1D_000/BaseCalled_template/Events Dataset {805}
/Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR}
/Analyses/Basecall_1D_000/Configuration
/Analyses/Basecall_1D_000/Configuration/aggregator
/Analyses/Basecall_1D_000/Configuration/basecall_1d
/Analyses/Basecall_1D_000/Configuration/basecall_2d
/Analyses/Basecall_1D_000/Configuration/calibration_strand
/Analyses/Basecall_1D_000/Configuration/components
/Analyses/Basecall_1D_000/Configuration/event_detection
/Analyses/Basecall_1D_000/Configuration/general
/Analyses/Basecall_1D_000/Configuration/hairpin_align
/Analyses/Basecall_1D_000/Configuration/post_processing
/Analyses/Basecall_1D_000/Configuration/post_processing.4000Hz
/Analyses/Basecall_1D_000/Configuration/split_hairpin
/Analyses/Basecall_1D_000/Log Dataset {SCALAR}
/Analyses/Basecall_1D_000/Summary
/Analyses/Basecall_1D_000/Summary/basecall_1d_complement
/Analyses/Basecall_1D_000/Summary/basecall_1d_template
/Analyses/Basecall_2D_000
/Analyses/Basecall_2D_000/BaseCalled_2D
/Analyses/Basecall_2D_000/BaseCalled_2D/Alignment Dataset {939}
/Analyses/Basecall_2D_000/BaseCalled_2D/Fastq Dataset {SCALAR}
/Analyses/Basecall_2D_000/Configuration
/Analyses/Basecall_2D_000/Configuration/aggregator
/Analyses/Basecall_2D_000/Configuration/basecall_1d
/Analyses/Basecall_2D_000/Configuration/basecall_2d
/Analyses/Basecall_2D_000/Configuration/calibration_strand
/Analyses/Basecall_2D_000/Configuration/components
/Analyses/Basecall_2D_000/Configuration/event_detection
/Analyses/Basecall_2D_000/Configuration/general
/Analyses/Basecall_2D_000/Configuration/hairpin_align
/Analyses/Basecall_2D_000/Configuration/post_processing
/Analyses/Basecall_2D_000/Configuration/post_processing.4000Hz
/Analyses/Basecall_2D_000/Configuration/split_hairpin
/Analyses/Basecall_2D_000/HairpinAlign
/Analyses/Basecall_2D_000/HairpinAlign/Alignment Dataset {662}
/Analyses/Basecall_2D_000/Log Dataset {SCALAR}
/Analyses/Basecall_2D_000/Summary
/Analyses/Basecall_2D_000/Summary/basecall_2d
/Analyses/Basecall_2D_000/Summary/hairpin_align
/Analyses/Basecall_2D_000/Summary/post_process_complement
/Analyses/Basecall_2D_000/Summary/post_process_template
/Analyses/Calibration_Strand_000
/Analyses/Calibration_Strand_000/Configuration
/Analyses/Calibration_Strand_000/Configuration/aggregator
/Analyses/Calibration_Strand_000/Configuration/basecall_1d
/Analyses/Calibration_Strand_000/Configuration/basecall_2d
/Analyses/Calibration_Strand_000/Configuration/calibration_strand
/Analyses/Calibration_Strand_000/Configuration/components
/Analyses/Calibration_Strand_000/Configuration/general
/Analyses/Calibration_Strand_000/Configuration/genome_mapping
/Analyses/Calibration_Strand_000/Configuration/hairpin_align
/Analyses/Calibration_Strand_000/Configuration/post_processing.3000Hz
/Analyses/Calibration_Strand_000/Configuration/split_hairpin
/Analyses/Calibration_Strand_000/Log Dataset {SCALAR}
/Analyses/Calibration_Strand_000/Summary
/Analyses/EventDetection_000
/Analyses/EventDetection_000/Configuration
/Analyses/EventDetection_000/Configuration/aggregator
/Analyses/EventDetection_000/Configuration/basecall_1d
/Analyses/EventDetection_000/Configuration/basecall_2d
/Analyses/EventDetection_000/Configuration/calibration_strand
/Analyses/EventDetection_000/Configuration/components
/Analyses/EventDetection_000/Configuration/event_detection
/Analyses/EventDetection_000/Configuration/general
/Analyses/EventDetection_000/Configuration/hairpin_align
/Analyses/EventDetection_000/Configuration/post_processing
/Analyses/EventDetection_000/Configuration/post_processing.4000Hz
/Analyses/EventDetection_000/Configuration/split_hairpin
/Analyses/EventDetection_000/Log Dataset {SCALAR}
/Analyses/EventDetection_000/Reads
/Analyses/EventDetection_000/Reads/Read_406
/Analyses/EventDetection_000/Reads/Read_406/Events Dataset {1461}
/Analyses/EventDetection_000/Summary
/Analyses/EventDetection_000/Summary/event_detection
/Analyses/Hairpin_Split_000
/Analyses/Hairpin_Split_000/Configuration
/Analyses/Hairpin_Split_000/Configuration/aggregator
/Analyses/Hairpin_Split_000/Configuration/basecall_1d
/Analyses/Hairpin_Split_000/Configuration/basecall_2d
/Analyses/Hairpin_Split_000/Configuration/calibration_strand
/Analyses/Hairpin_Split_000/Configuration/components
/Analyses/Hairpin_Split_000/Configuration/event_detection
/Analyses/Hairpin_Split_000/Configuration/general
/Analyses/Hairpin_Split_000/Configuration/hairpin_align
/Analyses/Hairpin_Split_000/Configuration/post_processing
/Analyses/Hairpin_Split_000/Configuration/post_processing.4000Hz
/Analyses/Hairpin_Split_000/Configuration/split_hairpin
/Analyses/Hairpin_Split_000/Log Dataset {SCALAR}
/Analyses/Hairpin_Split_000/Summary
/Analyses/Hairpin_Split_000/Summary/split_hairpin
/Raw
/Raw/Reads
/Raw/Reads/Read_406
/Raw/Reads/Read_406/Signal Dataset {20403/Inf}
/UniqueGlobalKey
/UniqueGlobalKey/channel_id
/UniqueGlobalKey/context_tags
/UniqueGlobalKey/tracking_id