nanoporetech / tombo

Tombo is a suite of tools primarily for the identification of modified nucleotides from raw nanopore sequencing data.
Other
232 stars 54 forks source link

"Reads do not contain basecalls" error. #205

Closed nxiao6gt closed 5 years ago

nxiao6gt commented 5 years ago

Hi, I ran the example command

tombo resquiggle path/to/fast5s/ genome.fasta --processes 4 --num-most-common-errors 5

and received the following error:

[15:45:46] Loading minimap2 reference. [15:45:46] Getting file list. **** ERROR **** Reads do not to contain basecalls. Check --basecall-group option if basecalls are stored in non-standard location or use tombo annotate_raw_with_fastqs to add basecalls from FASTQ files to raw FAST5 files.

What does this mean? What do basecalls mean here? I have all the fast5 files in the path/to/fast5s/ folder and reference fasta file.

Best regards,

marcus1487 commented 5 years ago

In order to complete tombo analysis, the raw signal and associated canonical basecalls are required. These basecalls are produced from the raw signal either during the run by minknow or post-analysis by the guppy software.

Guppy has the option to include basecalls in a FAST5 output. If the standard basecalling output via the FASTQ format was chosen, then these basecalls can be added to a set of raw FAST5 files using the tombo preprocess annotate_raw_with_fastqs command.

Hopefully this is enough information to get you started with your tombo analysis, but please do post if you have further questions regarding your analysis.

AnimalGenomeInstitute commented 5 years ago

Hi!

I ran into the same issue and ran tombo preprocess annotate_raw_with_fastqs as follows but nothing seemed to have changed. As expected, tombo resquiggle attempt after this also failed. What could be a potential reason here? The fast5 and fastq are what I grabbed from the *_pass folder that were created by MinKnow.

Thank you!

$ tombo preprocess annotate_raw_with_fastqs --overwrite --fast5-basedir ./ --fastq-filenames ./*.fastq --processes 8 [12:57:49] Preparing reads and extracting read identifiers. ** WARNING ** Basecalls exsit in specified slot for some reads. Set --overwrite option to overwrite these basecalls.
** WARNING ** Invalid warning code encountered.
** WARNING ** Invalid warning code encountered.
** WARNING ** Invalid warning code encountered.
** WARNING ** Invalid warning code encountered.
** WARNING ** Invalid warning code encountered.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 64.27it/s] [12:57:49] Annotating FAST5s with sequence from FASTQs. ** WARNING ** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
0it [00:00, ?it/s]
[12:57:50] Added sequences to a total of 0 reads.

marcus1487 commented 5 years ago

I have just pushed to fix the erroneous "Invalid warning code encountered" messages. These are just additional warnings that basecalls exist and that the --overwrite flag has not been set.

In terms of a fix for this issue, the first warning message indicates that some portion of the reads here contain basecalls already. Setting the --overwrite flag will replace all basecalls in fast5s with those specified in the FASTQs. The second warning indicates that some of FASTQ reads do not have corresponding read identifiers in the specified fast5s. This indicates that these called reads may be either from a different run or a different subset of reads from the same run.

In order to identify the issue here could the set of reads being annotated be re-called with guppy to ensure matching basecalls? Finally, could the output from the erroneous resquiggle call be posted in order to further identify any issues.

violet-everygarden commented 5 days ago

Hi, I initially ran this command and it worked:

command:guppy_basecaller --input_path fast5_pass/ --save_path guppy --num_callers 5 --recursive --fast5_out --flowcell FLO-MIN106 --kit SQK-RNA002 -x auto

result:runners per device: 4 Found 379 fast5 files to process. Init time: 2437 ms 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


        Caller time: 11059393 ms, Samples called: 103984224598, samples/s: 9.40234e+06
        Finishing up any open output files.
        Basecalling completed successfully.

Then I run next command:

command:conda activate tombo tombo resquiggle --rna --overwrite guppy/workspace/ /share/sequence/genome/hg38_p13/gencode.v27.transcripts.fa --processes 40 --fit-global-scale --include-event-stdev

and received the following error:

error:[15:54:19] Loading minimap2 reference. [15:54:32] Getting file list. **** ERROR **** Reads do not to contain basecalls. Check --basecall-group option if basecalls are stored in non-standard location or use tombo preprocess annotate_raw_with_fastqs to add basecalls from FASTQ files to raw FAST5 files.

How can I solve this problem?Thank you.