nanoporetech / tombo

Tombo is a suite of tools primarily for the identification of modified nucleotides from raw nanopore sequencing data.
Other
230 stars 54 forks source link

Unsuccessful resquiggle for RNA #443

Open vgori opened 11 months ago

vgori commented 11 months ago

Hi all,

I have sequencing data from the ONT PromethION platform with the Kit SQK-RNA002 and R9 chemistry. Tombo pipeline has been launched under the conda environment with the following version of tools: numpy=1.19.5=py37h3e96413_3 python=3.7.12=hf930737_100_cpython mappy=2.22=py37h190e900_0 1
minimap2=2.22=h5bf99c6_0 which should correspond to the Tombo requirements.

I`ve tried to run resquiggle several ways: 1) with cDNA human reference genome directly:

tombo resquiggle /path/to/SingleFast5s/ \ /path/to/Homo_sapiens.GRCh38.cdna.all.fa \ --rna --processes 18 --overwrite --num-most-common-errors 5

2) with the previously created index.mmi file: tombo resquiggle /path/to/SingleFast5s/ \ /path/to/index_fast5.mmi \ --rna --processes 18 --overwrite --num-most-common-errors 5 Both ways return to me the result: 94.2% reads unsuccessfully processed

Alignment_not_prodused_94

3) through previously created aligned .sam file:

minimap2 -ax splice -uf -k14 /path/to/index_fast5.mmi \ /path/to/fastq_single_ControlA.fastq | samtools sort \ -o /path/to/aligned_ControlA.sam

tombo resquiggle /path/to/SingleFast5s/ \ /path/to/aligned_ControlA.sam \ --rna --processes 18 --overwrite --num-most-common-errors 5

This way returns me the result with error U bases and 100% reads unsuccessfully processed

Non-canonical_bases

Meantime checking aligned_ControlA.sam file with the samtools shows 10195 mapped reads from 15579, which means 65% should be successfully aligned.

Am I missing something? There are issues with the tool?

keenhl commented 11 months ago

I'm trying to a similar analysis using data generated using the RNA002 kit. I'm still stuck on the preprocess step of adding the fastq sequence to the fast5 files. If I get to the resquiggle step, I will come back and post an update.

yuukiiwa commented 11 months ago

Hi @marcus1487,

I got similar problem with running tombo resquiggle:

tombo resquiggle --rna --overwrite --processes 20 fast5_pass/0/ /mnt/dataHDD/wanyk/xpore_related/background_JSON/diffmod_outputs/Homo_sapiens.GRCh38.cdna.all.fa --num-most-common-errors 5
[07:49:24] Loading minimap2 reference.
[07:49:30] Getting file list.
[07:49:31] Re-squiggling reads (raw signal to genomic sequence alignment).
5 most common unsuccessful read types (approx. %):
    93.2% (   3726 reads) : Unexpected error                                                                
     6.8% (    274 reads) : Alignment not produced                                                          
     -----
     -----
     -----
100%|█████████████████████████████████████████████████████████████████████████████████████| 4000/4000 [00:20<00:00, 198.70it/s]
******************** WARNING ********************
        Unexpected errors occured. See full error stack traces for first (up to) 50 errors in "unexpected_tombo_errors.8338.err"
[07:49:51] Final unsuccessful reads summary (100.0% reads unsuccessfully processed; 4000 total reads):
    93.2% (   3726 reads) : Unexpected error                                                                
     6.8% (    274 reads) : Alignment not produced                                                          
[07:49:51] Saving Tombo reads index to file.

tombo version: 1.5.1 numpy version: 1.19.0

tombo preprocess seems fine:

tombo preprocess annotate_raw_with_fastqs --fast5-basedir fast5_pass/0/ --fastq-filenames sample.fastq --overwrite
[07:44:58] Preparing reads and extracting read identifiers.
/home/wanyk/.local/lib/python3.8/site-packages/tombo/_preprocess.py:378: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
  with h5py.File(fast5_fn) as fast5_data:
100%|█████████████████████████████████████| 4000/4000 [00:02<00:00, 1632.85it/s]
[07:45:01] Annotating FAST5s with sequence from FASTQs.
****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
100%|██████████████████████████████████████| 4000/4000 [00:23<00:00, 172.13it/s]
[07:45:24] Added sequences to a total of 4000 reads.

Thanks!

Best wishes, Yuk Kei

tweedle828 commented 10 months ago

Hi @marcus1487,

I got similar problem with running tombo resquiggle:

tombo resquiggle --rna --overwrite --processes 20 fast5_pass/0/ /mnt/dataHDD/wanyk/xpore_related/background_JSON/diffmod_outputs/Homo_sapiens.GRCh38.cdna.all.fa --num-most-common-errors 5
[07:49:24] Loading minimap2 reference.
[07:49:30] Getting file list.
[07:49:31] Re-squiggling reads (raw signal to genomic sequence alignment).
5 most common unsuccessful read types (approx. %):
    93.2% (   3726 reads) : Unexpected error                                                                
     6.8% (    274 reads) : Alignment not produced                                                          
     -----
     -----
     -----
100%|█████████████████████████████████████████████████████████████████████████████████████| 4000/4000 [00:20<00:00, 198.70it/s]
******************** WARNING ********************
        Unexpected errors occured. See full error stack traces for first (up to) 50 errors in "unexpected_tombo_errors.8338.err"
[07:49:51] Final unsuccessful reads summary (100.0% reads unsuccessfully processed; 4000 total reads):
    93.2% (   3726 reads) : Unexpected error                                                                
     6.8% (    274 reads) : Alignment not produced                                                          
[07:49:51] Saving Tombo reads index to file.

tombo version: 1.5.1 numpy version: 1.19.0

tombo preprocess seems fine:

tombo preprocess annotate_raw_with_fastqs --fast5-basedir fast5_pass/0/ --fastq-filenames sample.fastq --overwrite
[07:44:58] Preparing reads and extracting read identifiers.
/home/wanyk/.local/lib/python3.8/site-packages/tombo/_preprocess.py:378: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
  with h5py.File(fast5_fn) as fast5_data:
100%|█████████████████████████████████████| 4000/4000 [00:02<00:00, 1632.85it/s]
[07:45:01] Annotating FAST5s with sequence from FASTQs.
****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
100%|██████████████████████████████████████| 4000/4000 [00:23<00:00, 172.13it/s]
[07:45:24] Added sequences to a total of 4000 reads.

Thanks!

Best wishes, Yuk Kei

Hi Yuk Kei!

I just met the same issue as you did. I wonder if you find the solution for this problem?

Thanks.

marcus1487 commented 10 months ago

Tombo is deprecated and as such the issues here will not be addressed. We have released full support for signal analysis of RNA reads in Remora. I would strongly suggest you move your pipelines to the Remora signal analysis API.