nanoporetech / tombo

Tombo is a suite of tools primarily for the identification of modified nucleotides from raw nanopore sequencing data.
Other
231 stars 54 forks source link

Question on " tombo resquiggle" #68

Closed weir12 closed 6 years ago

weir12 commented 6 years ago

Hi~ when I ran the following command : **tombo resquiggle "/home/weir/output/workspace/pass/1/" "/home/weir/genome/Homo_sapiens.GRCh38.dna.primary_assembly.fa" --processes 4 such errors is reported: Traceback (most recent call last): File "/home/weir/software/python/Python-3.6.1/Python3/bin/tombo", line 11, in load_entry_point('ont-tombo==1.3', 'console_scripts', 'tombo')() File "/home/weir/software/python/Python-3.6.1/Python3/lib/python3.6/site-packages/tombo/main.py", line 191, in main from . import resquiggle File "/home/weir/software/python/Python-3.6.1/Python3/lib/python3.6/site-packages/tombo/resquiggle.py", line 9, in import mappy ImportError: /home/weir/software/python27/lib/python2.7/site-packages/lib/python2.7/site-packages/lib/python2.7/site-packages/mappy.so: undefined symbol: _Py_ZeroStruct

I have installed tombo through :pip3 install numpy pip3 install ont-tombo[full] There seems to be no problem while tombo was installed and the CMD interface of tombo can be run normally through "tombo -h"

marcus1487 commented 6 years ago

This appears to be a mappy installation error. Can you confirm this by running python -c "import mappy"? I think this should reproduce this error.

It also looks like you are installing this via pip3 (to a python3 installation), but the listed path to the mappy installation looks to be a python2.7 installation. My guess would be that this is probably your issue. pip3 should have handled this, but I know that pip can have trouble correctly linking installed packages.

Hopefully this helps to resolve your issues. Best of luck!

weir12 commented 6 years ago

interesting! when I ran python -c "import mappy".NO error was reported. But then I ran python3 -c "import mappy".this error is shown : Traceback (most recent call last): File "", line 1, in ImportError: /home/weir/software/python27/lib/python2.7/site-packages/lib/python2.7/site-packages/lib/python2.7/site-packages/mappy.so: undefined symbol: _Py_ZeroStruct

I ran "pip3 install mappy" to fix this trouble.but pip tell me that module is already installed: Requirement already satisfied: mappy in ./software/python27/lib/python2.7/site-packages/lib/python2.7/site-packages/lib/python2.7/site-packages (2.10) thanks!

marcus1487 commented 6 years ago

You can add the -I flag to re-install a package with pip and ignore any previously installed version. So try pip3 install -I mappy. Hopefully that fixes your issue.

weir12 commented 6 years ago

Thanks for your help! I have resolved this trouble by running pip3 uninstall mappy. May it's due to incompatible module was installed.

weir12 commented 6 years ago

Hi marcus sorry for bother you again.I have met a new trouble: when I ran tombo resquiggle "/home/weir/output/workspace/pass/1/" "/home/weir/genome/Homo_sapiens.GRCh38.dna.primary_assembly.fa" --processes 4 errors was displayed in the follow : [17:20:18] Loading minimap2 reference. [17:22:53] Getting file list. [17:22:53] Using default canonical RNA model. [17:22:56] Re-squiggling reads (raw signal to genomic sequence alignment). 100%|█████████████████████████████████████████████████████████████████| 4000/4000 [09:12<00:00, 7.24it/s] [17:32:09] Failed reads summary (3741 total failed): Alignment not produced : 236 Not enough raw signal around potential genomic deletion(s) : 49 Poor raw to expected signal matching (revert with tombo clear_filters) : 3305 Read event to sequence alignment extends beyond --bandwidth : 117 Read failed sequence-based signal re-scaling parameter estimation. : 3 Reference mapping contains non-canonical bases (transcriptome reference cannot contain U bases) :31

fast5 files was basecalled by Albacore,which contain “Event” and "fastQ"information.which is human tissue mRNA

reference sequence was downloaded in Ensemble database,which is human genome DNA

I knew process RNA needing "transcriptome reference".But I don't know how to get suitable reference sequence file for different samples

should I merge all the fastQ files which was generated by Albacore into a single file as my reference sequence file?

marcus1487 commented 6 years ago

For spliced samples, I would currently recommend using a cDNA (transcriptome) fasta file. You may want to add the ncRNA FASTA file to this if non-coding transcripts are of interest as well. You can find these types of files on the ensmbl FTP site as well as several other repositories.

This should help the processing of spliced transcripts, but the larger issue is the poor raw to expected signal levels error. I am actively working to fix this issue. You can find discussion about this issue and current workarounds in a couple of other issues (#55 and #63).

weir12 commented 6 years ago

thanks,I have follow your advice:download cDNA (transcriptome) fasta file.For resolve more serious issues:such commands has been done: tombo filter clear_filters --fast5-basedirs "/home/weir/output/workspace/pass/1/"

All filters successfully cleared!

tombo filter raw_signal_matching --fast5-basedirs "/home/weir/output/workspace/pass/1/" --signal-matching-score 0.5

maybe lower threshold value can reduce error in downstrem processing?

Filtered 2257 reads due to signal matching filter from a total of 2830 reads then entered python interface for get appropriate threshold value

from tombo import tombo_helper run_data = tombo_helper.parse_fast5s(["/home/weir/output/workspace/pass/1/",],'Basecall_1D_000', ['BaseCalled_template',]) **** WARNING **** Tombo index file does not exist for one or more directories. If --skip-index was not set for re-squiggle command, ensure that the specified directory is the same as for the re-squiggle command.

It seem caused by the lack of re-squiggle command processing

So, I ran tombo resquiggle "/home/weir/output/workspace/pass/1/" "/home/weir/transcriptome/Homo_sapiens.GRCh38.cdna.abinitio.fa" --processes 6 --overwrite However,even I have adjusted threshold,similar error has been reported: Failed reads summary (3931 total failed): Alignment not produced : 1052 Not enough raw signal around potential genomic deletion(s) : 42 Poor raw to expected signal matching (revert with tombo clear_filters) : 2770 Read event to sequence alignment extends beyond --bandwidth : 65 Read failed sequence-based signal re-scaling parameter estimation. : 2

marcus1487 commented 6 years ago

A couple of notes on the signal matching score:

  1. Lower scores indicate better matching, so increase the threshold in order to allow more reads into downstream analysis.
  2. The threshold is applied to the results of the re-squiggle command (or during) so when re-squiggle is re-run the threshold must be set again. You can set a new threshold during the re-squiggle command by setting the --signal-matching-score there. See tombo resquiggle -h for more details.
weir12 commented 6 years ago

Hi~

when I ran tombo plot max_coverage --fast5-basedirs "/home/weir/output/sixrun/MTB/workspace/pass/3" --plot-standard-model

such error report:

**** ERROR **** R and rpy2 must be linked during installation. Run python -c "from rpy2 import robjects; from rpy2.robjects.packages import importr to identify installation linking issues. If using conda, check that R and rpy2 are installed from the same channel with conda list | grep "r-base|rpy2" (last columns should match).

However:I run ​ [weir@node6 3]$python -c "from rpy2 import robjects; from rpy2.robjects.packages import importr" (No error was reported)

and my Python,R,rpy2(check by "import rpy2"),are all work well.

thanks!

marcus1487 commented 6 years ago

This most often happens when your default python installation and the python installation linked during the Tombo installation are different. Thus the rpy2 installation that works here is not found when Tombo is run. Possibly try uninstalling and re-installing Tombo to make sure that it is linked to the same python version.