nanoporetech / tombo

Tombo is a suite of tools primarily for the identification of modified nucleotides from raw nanopore sequencing data.
Other
231 stars 54 forks source link

all reads fail during re-squiggle #163

Closed lappazos closed 5 years ago

lappazos commented 5 years ago

all reads get "Alignment not produced" i'm having trouble to understand whether the problem is in the reference file (i'm sequencing direct-RNA samples and used 'GRCh38_latest_rna.fna'), or in the mappy API(i never had the chance to deal with the API)

the command i used - tombo resquiggle '/media/lab/Windows/OFIR_DATA/Lior/Dox-minus-DirectRNA-25-12-18/GA50000/reads' '/media/lab/Windows/OFIR_DATA/Lior/GRCh38_latest_rna.fna' --rna --processes 2 --overwrite --failed-reads-filename failed_dox_minus_resquiggle --num-most-common-errors 10 --ignore-read-locks

marcus1487 commented 5 years ago

This is most likely an issue with the agreement of the source reads and the provided reference. This error indicates that the mappy API did not produce an alignment from this read sequence mapped to this reference. Other mappy related errors would be caught at different locations within the code and produce different error messages.

It is important to note that spliced RNA data must be mapped to a transcriptome reference (not sure if that is the file specified here). This is the most common error mode for tombo RNA experiments. If this is indeed the transcriptome reference, then this is likely an issue of the reads mapping to the reference. This can have a variety of causes, but these are upstream of tombo. To confirm this result, reads should not map to this reference from the standard command line minimap2 interface. If these reads map to this reference via the minimap2 command line (with the standard map-ont preset setting), then further issues within tombo could be investigated.

lappazos commented 5 years ago

could you please elaborate on how exactly do i check in the standard command line minimap2 interface

marcus1487 commented 5 years ago

If you have the FASTA or FASTQ base calls that are loaded into this set of FAST5 reads, the you can run minimap2 -ax map-ont GRCh38_latest_rna.fna reads.fastq > reads.sam. If this produces valid mappings and these are the exact basecalls found in the FAST5 files (confirmed by using the tombo preprocessing commands), then there may be an issue with the tombo results. If no mappings are produced then the issue is upstream from tombo and upstream of the mapping step specifically.

lappazos commented 5 years ago

the command line can't detect minimap2... is it possible i dont have it on my computer? other command line variation might work?

marcus1487 commented 5 years ago

Here is the minimap2 package referenced in this command: https://github.com/lh3/minimap2.

With tombo installed minimap.py -x map-ont GRCh38_latest_rna.fna reads.fastq > reads.paf should work.

lappazos commented 5 years ago

ok i managed to run it and got a file, how do i make sure its valid? any chance i can send them to you to take a look?

marcus1487 commented 5 years ago

Are there mapping records in the file? If you could post the first couple lines in the file that would help.

Here are details for the PAF file format (https://github.com/lh3/miniasm/blob/master/PAF.md).

lappazos commented 5 years ago

@SQ SN:NM_000014.5 LN:4945 @SQ SN:NM_000015.2 LN:1317 @SQ SN:NM_000016.5 LN:2623 @SQ SN:NM_000017.3 LN:1964 @SQ SN:NM_000018.3 LN:2296 @SQ SN:NM_000019.3 LN:2149 @SQ SN:NM_000020.2 LN:4263 @SQ SN:NM_000021.3 LN:6107 @SQ SN:NM_000022.3 LN:1574 @SQ SN:NM_000023.3 LN:1461 @SQ SN:NM_000024.5 LN:2058 @SQ SN:NM_000025.2 LN:2660 @SQ SN:NM_000026.3 LN:1757 @SQ SN:NM_000027.3 LN:2113 @SQ SN:NM_000028.2 LN:7449 @SQ SN:NM_000029.3 LN:2587 @SQ SN:NM_000030.2 LN:1611 @SQ SN:NM_000031.5 LN:3200 @SQ SN:NM_000032.4 LN:2044 @SQ SN:NM_000033.3 LN:3697 @SQ SN:NM_000034.3 LN:2408 @SQ SN:NM_000035.3 LN:2426 @SQ SN:NM_000036.2 LN:2426 @SQ SN:NM_000037.3 LN:8300 @SQ SN:NM_000038.5 LN:10740 @SQ SN:NM_000039.2 LN:1239 @SQ SN:NM_000040.2 LN:567 @SQ SN:NM_000041.3 LN:1234 @SQ SN:NM_000042.2 LN:1216 @SQ SN:NM_000043.5 LN:3951 @SQ SN:NM_000044.4 LN:10070

marcus1487 commented 5 years ago

Ah you got the SAM format version working. Could you run without the a flag (minimap2 -x map-ont GRCh38_latest_rna.fna reads.fastq > reads.paf).

lappazos commented 5 years ago

6518e6ac-f973-48c4-8727-6a6e2aee0316 857 51 107 - NM_014903.5 9774 5628 5682 42 56 0 tp:A:P cm:i:5 s1:i:42 s2:i:42 dv:f:0.0584 6518e6ac-f973-48c4-8727-6a6e2aee0316 857 51 107 - XM_011538940.2 13718 9588 9642 42 56 0 tp:A:S cm:i:5 s1:i:42 dv:f:0.0584 6518e6ac-f973-48c4-8727-6a6e2aee0316 857 51 107 - XM_017020172.1 8629 4499 4553 42 56 0 tp:A:S cm:i:5 s1:i:42 dv:f:0.0584 6518e6ac-f973-48c4-8727-6a6e2aee0316 857 51 107 - XM_017020164.2 13807 9677 9731 42 56 0 tp:A:S cm:i:5 s1:i:42 dv:f:0.0584 6518e6ac-f973-48c4-8727-6a6e2aee0316 857 51 107 - XM_017020175.1 8442 4312 4366 42 56 0 tp:A:S cm:i:5 s1:i:42 dv:f:0.0584 6518e6ac-f973-48c4-8727-6a6e2aee0316 857 51 107 - XM_017020173.2 12630 8500 8554 42 56 0 tp:A:S cm:i:5 s1:i:42 dv:f:0.0584 8014e28a-db06-4c0a-bf42-e37571e1583c 1747 720 769 - NM_001310121.1 3993 2685 2734 49 49 0 tp:A:P cm:i:9 s1:i:49 s2:i:49 dv:f:0.0134 8014e28a-db06-4c0a-bf42-e37571e1583c 1747 720 769 - NR_132311.1 4651 3418 3467 49 49 0 tp:A:S cm:i:9 s1:i:49 dv:f:0.0134 8014e28a-db06-4c0a-bf42-e37571e1583c 1747 720 769 - NM_002973.3 4712 3479 3528 49 49 0 tp:A:S cm:i:9 s1:i:49 dv:f:0.0134 d6f03e4b-9945-4997-a633-01f490644fae 1299 48 466 + XR_001745775.1 2431 231 642 93 424 60 tp:A:P cm:i:7 s1:i:90 s2:i:0 dv:f:0.1649 d6f03e4b-9945-4997-a633-01f490644fae 1299 371 806 + XM_024452510.1 1210 525 1015 85 491 0 tp:A:P cm:i:6 s1:i:73 s2:i:73 dv:f:0.1775

lappazos commented 5 years ago

from taking a look a the 10th and 11th columns, it seems that the alignment succeeded, not? seems like there is a problem with tombo

marcus1487 commented 5 years ago

These alignments seem very short though (all less than 10% of the query length; field 10 divided by field 2). So this may be a default thresholds issue. The other issue is that the alignments all seem to be to multiple locations with the same mapping statistics. This is an issue with RNA as many transcripts may have portions of the same sequence originating from the same exons. This may be an issue with the mappy best_n=1 setting used in tombo.

Before we check these items though can you confirm 2 points. 1) Are these FASTA/FASTQ basecalls the ones loaded into these FAST5 read files? Can this be confirmed by running tombo preprocess annotate_raw_with_fastqs and then re-running the re-squiggle command. 2) is the same version of minimap2 installed via the command line and API (python -c "import mappy; print(mappy.__version__)" and minimap2 --version).

lappazos commented 5 years ago

regarding tombo preprocess annotate_raw_with_fastqs, i ran it from the begging, before re-squiggling (the system asked me to) regarding the versions - minimap2 --version gave me 2.13-r860-dirty python -c "import mappy; print(mappy.version)" gave me 2.13

lappazos commented 5 years ago

?

marcus1487 commented 5 years ago

If you could test one of the reads mapping (which mapped successfully with command line minimap2) with the following snippet of python code to perform the alignment this would help to identify the issue. This may be an issue with the best_n=1 bit, so if this code produces a StopIteration error then if it could be tested without the best_n option as well. Hopefully this will help to identify the issue at hand here.

import mappy

reference_fn = 'GRCh38_latest_rna.fna'
read_seq = ''

aligner = mappy.Aligner(reference_fn, preset='map-ont', best_n=1)
alignment = next(aligner.map(read_seq))
marcus1487 commented 5 years ago

Was this snippet helpful in resolving this issue?

marcus1487 commented 5 years ago

Has this snippet of code identified the problem presented here?

lappazos commented 5 years ago

Hi, i'm deeply sorry for my disappearance. i had personal issues that kept me from work on the past month, but i'm back:) so i did what you asked, it did performed a stop iteration error.i tried again without the flag this time, and stop iteration occured again -

import mappy reference_fn = '/media/lab/Windows/OFIR_DATA/Lior/GRCh38_latest_rna.fna' read_seq = '/media/lab/Windows/OFIR_DATA/Lior/Dox-minus-DirectRNA-25-12-18/GA50000/fastq_13.fastq' aligner = mappy.Aligner(reference_fn, preset='map-ont', best_n=1) alignment = next(aligner.map(read_seq)) Traceback (most recent call last): File "", line 1, in StopIteration aligner = mappy.Aligner(reference_fn, preset='map-ont') alignment = next(aligner.map(read_seq)) Traceback (most recent call last): File "", line 1, in StopIteration

just to make sure, i ran a minure earlier on the same fastq file the following command - '/home/lab/minimap2/python/minimap2.py' -x map-ont '/media/lab/Windows/OFIR_DATA/Lior/GRCh38_latest_rna.fna' '/media/lab/Windows/OFIR_DATA/Lior/Dox-minus-DirectRNA-25-12-18/GA50000/fastq_13.fastq' > reads5.paf

and recieved this - 1a69a23e-92c9-4df7-8c4a-4cea17eb46a7 1724 648 692 - NM_213589.2 9749 33 77 44 44 0 tp:A:P ts:A:. cg:Z:44M 1a69a23e-92c9-4df7-8c4a-4cea17eb46a7 1724 648 692 - NM_203365.3 2782 33 77 44 44 0 tp:A:S ts:A:. cg:Z:44M 1a69a23e-92c9-4df7-8c4a-4cea17eb46a7 1724 648 692 - NM_001329728.1 3363 34 78 44 44 0 tp:A:S ts:A:. cg:Z:44M 1a69a23e-92c9-4df7-8c4a-4cea17eb46a7 1724 648 692 - XM_011511646.3 9824 20 64 44 44 0 tp:A:S ts:A:. cg:Z:44M 8e4145ac-5e6a-481c-9b80-47c7b16bafa9 623 396 443 - XM_011511646.3 9824 17 64 46 47 0 tp:A:P ts:A:. cg:Z:47M 8e4145ac-5e6a-481c-9b80-47c7b16bafa9 623 396 443 - NM_203365.3 2782 30 77 46 47 0 tp:A:S ts:A:. cg:Z:47M 8e4145ac-5e6a-481c-9b80-47c7b16bafa9 623 396 443 - NM_001329728.1 3363 31 78 46 47 0 tp:A:S ts:A:. cg:Z:47M 8e4145ac-5e6a-481c-9b80-47c7b16bafa9 623 396 443 - NM_213589.2 9749 30 77 46 47 0 tp:A:S ts:A:. cg:Z:47M

marcus1487 commented 5 years ago

In the code I sent the read_seq should be the actual sequence and not the filename. For example you could test just the sequence from one of the reads that maps from the command line (e.g. 1a69a23e-92c9-4df7-8c4a-4cea17eb46a7).

Note that this should be just the read sequence and not the fasta/fastq formatted sequence record.

lappazos commented 5 years ago

on both cases, with or without best_n=1, print(alignment) produces the same - 648 692 - NM_213589.2 9749 33 77 44 44 0 tp:A:P ts:A:. cg:Z:44M on the sequence you mentioned

lappazos commented 5 years ago

no stop iteration occurred. the sequence is TGGAATTCCCCGGGGGAGGAACGGGAGGGGGAGGTAAGAAGGAAAGAGGTACGTGTTCTGGGCTGAGGGGTGGGGTGGCTGGGGAGGGGCTTAAAATGCCCAGCGTGGTAAGAAGCTAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGGCTAATGACTGGCTAATGACTGGCTAATGACTGGCTAATGGCTAATGGCTAATGGCTAATGGCTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGGCTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGAGCTAATGAGCTAATGAGCTAATGAGCTAATGAGCTAATGAATGAATGAATGAATGAATGAATGAATGAATGGGAAGAAAGTAAAAATAGGGTGAGAAAAAGGGGTAGAACTCAATATTGAAAGAGCTAAGTAAGCCCTGGCCTGCTAATTAGGCAAATGGGGCTGTGACTCCATGGATGTGTGCCAGGAGGGCCCTAAAATACTGGTAGGGTAAATGAAGCTAAGGTGAGGAGGAGAAAAGAAAAATTTAAGAAAGAGGTAGGAGAGTCCCTCAGCTAAATGTAAATGTCAAAATAATGAAGCAATGGCAAGGGCTTGTAGAGGCGAAATAAGGTATAAGGCGTGGGAGGTAGTAAATGTGGGGGTAAGGTAAATGGGAGCTCCAGGAGGCAGCTAATATGAGAAATAGCAAGCCGTGAGGAGCTGGCAAGGTATATGTCACTTATATATAGCAAGCTGTGTAGAAAGAGTGCCCAAGAAAGGTACTTGGTATAATTACATAAAAAAGGGTGAAACGTGAAATTAGAAAATAAAGGGGTGCCCTCCCAGGGGTGCAGTATAAAAGGCCAGGCAAGCCTCATAGTAAAGAAAAATAGAGGTGGGAGGTGGCAAAATTCTGTATATGTCCCTTCTCTGAGGTAAGAGGTGGAGAGTAGAGAAGTTAAGGTAGCTGACAGGAGATGTGCTTATGTAAAACTCATCAGAACCCTGCAGGGAGGGTGAGTGAGGAGATACTCCAGGTAAATGGGGTGGGG

marcus1487 commented 5 years ago

Given that this read sequence maps via the mappy interface, it seems that the issue lies in getting this sequence from the FAST5 to the mappy interface. Could you check that this is the exact sequence loaded into the FAST5 for this read? You can check this with the following command: h5dump -d Analyses/Basecall_1D_000/BaseCalled_template/Fastq read.fast5

lappazos commented 5 years ago

this what came up- HDF5 "/media/lab/Windows/OFIR_DATA/Lior/Dox-minus-DirectRNA-25-12-18/GA50000/reads/13/GXB01249_20181225_FAK22177_GA50000_sequencing_run_Dox_minus_DirectRNA_25_12_18_83725_read_2267_ch_61_strand.fast5" { DATASET "Analyses/Basecall_1D_000/BaseCalled_template/Fastq" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "@1a69a23e-92c9-4df7-8c4a-4cea17eb46a7 runid=77c431d44d7b4387e2fb9a19d781b4091b9142bb sampleid=Dox-minus-DirectRNA-25-12-18 read=2267 ch=61 start_time=2018-12-25T13:27:05Z TGGAATTCCCCGGGGGAGGAACGGGAGGGGGAGGTAAGAAGGAAAGAGGTACGTGTTCTGGGCTGAGGGGTGGGGTGGCTGGGGAGGGGCTTAAAATGCCCAGCGTGGTAAGAAGCTAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGGCTAATGACTGGCTAATGACTGGCTAATGACTGGCTAATGGCTAATGGCTAATGGCTAATGGCTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGGCTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGACTAATGAGCTAATGAGCTAATGAGCTAATGAGCTAATGAGCTAATGAATGAATGAATGAATGAATGAATGAATGAATGGGAAGAAAGTAAAAATAGGGTGAGAAAAAGGGGTAGAACTCAATATTGAAAGAGCTAAGTAAGCCCTGGCCTGCTAATTAGGCAAATGGGGCTGTGACTCCATGGATGTGTGCCAGGAGGGCCCTAAAATACTGGTAGGGTAAATGAAGCTAAGGTGAGGAGGAGAAAAGAAAAATTTAAGAAAGAGGTAGGAGAGTCCCTCAGCTAAATGTAAATGTCAAAATAATGAAGCAATGGCAAGGGCTTGTAGAGGCGAAATAAGGTATAAGGCGTGGGAGGTAGTAAATGTGGGGGTAAGGTAAATGGGAGCTCCAGGAGGCAGCTAATATGAGAAATAGCAAGCCGTGAGGAGCTGGCAAGGTATATGTCACTTATATATAGCAAGCTGTGTAGAAAGAGTGCCCAAGAAAGGTACTTGGTATAATTACATAAAAAAGGGTGAAACGTGAAATTAGAAAATAAAGGGGTGCCCTCCCAGGGGTGCAGTATAAAAGGCCAGGCAAGCCTCATAGTAAAGAAAAATAGAGGTGGGAGGTGGCAAAATTCTGTATATGTCCCTTCTCTGAGGTAAGAGGTGGAGAGTAGAGAAGTTAAGGTAGCTGACAGGAGATGTGCTTATGTAAAACTCATCAGAACCCTGCAGGGAGGGTGAGTGAGGAGATACTCCAGGTAAATGGGGTGGGG + "&&$$$&%$%).%(($$$#%$"$"""$$''""""#&#&'%%'&%$)(&%('%#$$##%'$#"$#'&&&"&))+&($##'##""##$%'((0)&"""""%&&'##$#%&'''#"$#+#"%#"""$$##$#""##"#%$##%$##%$##%$##$$"#$$""$$"#%$"#$$"#$"""#"""#"""#"""#""#$"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""""""""""#"""#"""#"""""""#"""#"""#"""#"""""""#""""""""""""""""""""""""""#"""#"""#"""""""""""#"""$#""$#""$#""$"""$#""$#""$#""$#""$###$##"$##"$"#"$##"$##"$"#####"$##"$##"$"##$###$"###"###"###########"###########"###"#######################################################################################################"######################"#"#"#"#"#"#"#"#"""#"#"#"#"#"""#"#"#"#"#"#"#"#"#"""#"""#"""#"""#"""#"""#"""#"""#"""#""""""""""#"""""""""#$""""""""""$"""#"#$"""#"#$"""#"#$"""#"#$"""#"#""""##"#"""#""""""#""""""#""""""#""""""#""""""#""""""#""""""$#"#"""##"%""#$"##"""#"""""#$#"#""#$#"#"""#"""""#%#"#""#$""#""##""#"""#"""""#$""#"""#""""""#""""""#""""""#""""""#""""""#""""""#""#"""#""#"""#""#"""#""#""""#""#""""#"##""""#"##""""#"##""""#"##"""#"""#"""#"""#"""#"""#"""""""#(+)(%(&"""#$'%%&&$$'%%&+0.(&%&$"$$#$$$#"##""""%('$&$##&%""##""#"#"&$#""""##+#$$$#%))'&+'$"""$$%(*$##""%&%$&#"""##$"%&"""""$%&#$""""#(&&)&&##$%'&$"#"$%%%%&((($'"""$%1#))'&%$$"/.".,$"$#$&'+)%#$#$#"$"""""#"####'&%$"$%),+$')""##"&&#"""""$)'("""""&%%&%&#$%&%)%$'%$&$#%"%$$&#&''&$%$%%#""$$(&%%'&%"#'($$"""#"$$%%'"""""&""##%$$$"""""#$')"$"#"###"#""##"%$%%#$"$$#"$"%"#$#%%##"$"%"("###""####%(&+."%#"#"#"#&"&%$"""$%'"$""""#$&"$#%#&$$%%$'+''%&$$""""$$$##""$&(#"#$#)''$"""""""#&#&''&%%"#""#"%+)%'%"""""""#"""""""##"&$"$%%0(+######%'("((%"""$$%$%#$$#%#$"$"$('#%#"##"""#$$('#%%$##%$'#%&#""""""""%$"#%$"#"""""$$##$#""""*)$"""%'&($$%'%$%###""""$"#&(&$$#'"'"%#$#$#"#""""#$$""#""''"&'%$"$$#" " } } }

lappazos commented 5 years ago

identical

marcus1487 commented 5 years ago

This is very unexpected behavior indeed. Could you share this raw FAST5 read and point to the location for the reference file being processed as I cannot reproduce this error with any data I have seen?

Alternatively, could you run the resquiggle command command on a directory with just this read and print the sequence provided to the mappy code within the tombo framework. This would involve the following step:

  1. Add this line of code (print(seq_data.seq)) above this line in the codebase
  2. Re-installing using pip install . from the repo root
  3. Run the command line resquiggle command against a directory with just this read of interest.
marcus1487 commented 5 years ago

Were you able to run this testing?

lappazos commented 5 years ago

attached

this is the refrence

(ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_rna.fna.gz)

this is the fast5 file

test.zip

lappazos commented 5 years ago

also, i suggest maybe we can schedule for today or tomorrow a time where u can log in remotely to my computer using teamviewer and check everything live. so we wont have to keep waiting for each other answers.

marcus1487 commented 5 years ago

I am unable to reproduce the reported error. I have run the below commands and received the output below. The Poor raw to expected signal matching error indicates that a successful alignment was completed for this read. Let me know if this set of commands gives an error on your system.

# download read
wget https://github.com/nanoporetech/tombo/files/3256706/test.zip
unzip test.zip
mkdir read
mv GXB01249_20181225_FAK22177_GA50000_sequencing_run_Dox_minus_DirectRNA_25_12_18_83725_read_2267_ch_61_strand.fast5  read/

# download reference
wget ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_rna.fna.gz
gunzip GRCh38_latest_rna.fna.gz

# prepare conda env
conda create -n tombo_test python=3.6
conda activate tombo_test
conda install ont-tombo

# resquiggle read
tombo resquiggle read/ GRCh38_latest_rna.fna --rna

Output:

[06:24:25] Loading minimap2 reference.
[06:24:39] Getting file list.
[06:24:39] Re-squiggling reads (raw signal to genomic sequence alignment).
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.99it/s]
[06:24:40] Final unsuccessful reads summary (100.0% reads unsuccessfully processed; 1 total reads):
   100.0% (      1 reads) : Poor raw to expected signal matching (revert with `tombo filter clear_filters`) 
[06:24:40] Saving Tombo reads index to file.
lappazos commented 5 years ago

it produce the same error for me as well. but still, when running the whole set of reads, it produce the alignment error. is there a chance we will do what i mentioned in teamviewer? this gap really put a setback to our research, and i would like to solve it as quickly as possible.

marcus1487 commented 5 years ago

Tombo is offered with limited support. If you can provide a minimal reproducible error we can have a look to identify the issue.