tombo preprocess can not attach basecall information

AyanoClarke commented 1 year ago

Hi, everyone

when I run tombo preprocess annotate_raw_with_fastqs get no results. the full command of a small example is

# directory fast5/ stores the read b82c9dc6-29df-4807-8c88-9911a29f503c's fast5
> tombo preprocess annotate_raw_with_fastqs --fast5-basedir fast5/ --fastq-filenames b82c9dc6-29df-4807-8c88-9911a29f503c.fastq
# [13:24:23] Preparing reads and extracting read identifiers.
# 100%|█████████████████████████████| 1/1 [00:00<00:00,  9.90it/s]
# [13:24:24] Annotating FAST5s with sequence from FASTQs.
# ****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
# 0it [00:00, ?it/s]
# [13:24:24] Added sequences to a total of 0 reads.

And I check the fast5 file by h5ls -r and get no Analyses group.

> h5ls -r fast5/b82c9dc6-29df-4807-8c88-9911a29f503c.fast5
#/                        Group
#/Raw                     Group
#/Raw/Reads               Group
#/Raw/Reads/Read_8        Group
#/Raw/Reads/Read_8/Signal Dataset {4963/Inf}
#/UniqueGlobalKey         Group
#/UniqueGlobalKey/channel_id Group
#/UniqueGlobalKey/context_tags Group
#/UniqueGlobalKey/tracking_id Group

This problem occurred after software and file format have been updated on the ONT sequencer. How to solve it? Thank you for your time and we look forward to your reply.

AyanoClarke commented 1 year ago

After re-basecalling by guppy, the problem is solved. The command is here:

guppy_basecaller -i fast5 -c dna_r9.4.1_450bps_hac_prom.cfg -s basecalling --fast5_out

and use fast5 files in basecalling/workspace for the next analysis. it works.

the structure of the new fast5 file is

> h5ls -r basecalling/workspace/b82c9dc6-29df-4807-8c88-9911a29f503c.fast5
#/                        Group
#/Analyses                Group
#/Analyses/Basecall_1D_000 Group
#/Analyses/Basecall_1D_000/BaseCalled_template Group
#/Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR}
#/Analyses/Basecall_1D_000/BaseCalled_template/Move Dataset {956}
#/Analyses/Basecall_1D_000/Summary Group
#/Analyses/Basecall_1D_000/Summary/basecall_1d_template Group
#/Analyses/RawGenomeCorrected_000 Group
#/Analyses/RawGenomeCorrected_000/BaseCalled_template Group
#/Analyses/RawGenomeCorrected_000/BaseCalled_template/Alignment Group
#/Analyses/RawGenomeCorrected_000/BaseCalled_template/Events Dataset {402}
#/Analyses/Segmentation_000 Group
#/Analyses/Segmentation_000/Summary Group
#/Analyses/Segmentation_000/Summary/segmentation Group
#/Raw                     Group
#/Raw/Reads               Group
#/Raw/Reads/Read_8        Group
#/Raw/Reads/Read_8/Signal Dataset {4963/Inf}
#/UniqueGlobalKey         Group
#/UniqueGlobalKey/channel_id Group
#/UniqueGlobalKey/context_tags Group
#/UniqueGlobalKey/tracking_id Group

Is there any chance for tombo not to use the Analyses group?

AyanoClarke commented 1 year ago

Another solution is to use the summary file in tombo preprocess annotate_raw_with_fastqs

GuardSkill commented 1 year ago

I meet the same problem, the fast5s can't be added any annotation. I use dorado basecall, and it's inputs are pod5 file of multi reads. maybe this reason?

nanoporetech / tombo

tombo preprocess can not attach basecall information #416