haotianteng / Chiron

A basecaller for Oxford Nanopore Technologies' sequencers
Other
122 stars 53 forks source link

basecall_group #67

Closed pcjedi closed 6 years ago

pcjedi commented 6 years ago

the default basecall_group changed to cwDTWCorrected_000, but none of the provided sample fast5 at https://data.genomicsresearch.org/Projects/train_set_all/ have this key.

Also, within the description of the basecall_group parameter, "Basecall_1D_000" is described as default. This key does not provide "read_start_rel_to_raw", but "duration" and "start_time". Maybe "start_time" is the same here? For me, "RawGenomeCorrected_000" is the only option that works. Is there somewhere a documented distinction between the options for this parameter?

from the provided (34,383) sample fast5 files, these are the numbers of keys present within: default(cwDTWCorrected_000): 0 RawGenomeCorrected_000: 29,849 Basecall_1D_000: 34,383 (all)

haotianteng commented 6 years ago

cwDTWCorrected_000 is generated by chron_label.py RawGenomeCorrected_000 is generated by Tombo And the train_set_all only has RawGenomeCorrected_000 as they are labelled by Tombo.

Basecall_1D_000 is generated by Albacore and is not ready for training as it has different format. The description has been fixed, thank you for reminding.