nanoporetech / tombo

Tombo is a suite of tools primarily for the identification of modified nucleotides from raw nanopore sequencing data.
Other
230 stars 54 forks source link

per_read_roc error #177

Closed JiahuiWangGit closed 5 years ago

JiahuiWangGit commented 5 years ago

Hi there is an error when running per_read_roc:

genome_fasta="/projects/li-lab/reference/hg38/hg38.fasta" tombo plot per_read_roc \ --per-read-statistics-filenames mathylation_calls.5mC.tombo.per_read_stats \ --modified-locations "Bisulfite Ground Truth":mod_pos.bed \ --unmodified-locations unmod_pos.bed \ --genome-fasta $genome_fasta --pdf-filename per_read_roc.pdf

[14:21:08] Extracting per-read statistics. Traceback (most recent call last): File "/projects/li-lab/software/miniconda3/envs/tombo_36/bin/tombo", line 11, in sys.exit(main()) File "/projects/li-lab/software/miniconda3/envs/tombo_36/lib/python3.6/site-packages/tombo/main.py", line 279, in main _plot_commands.plot_main(args) File "/projects/li-lab/software/miniconda3/envs/tombo_36/lib/python3.6/site-packages/tombo/_plot_commands.py", line 2363, in plot_m ain plot_per_read_roc(**kwargs) File "/projects/li-lab/software/miniconda3/envs/tombo_36/lib/python3.6/site-packages/tombo/_plot_commands.py", line 295, in plot_pe r_read_roc 'stat':r.FloatVector(unzip_stats[0]), IndexError: list index out of range

Any thoughts on what could be the problem here?

Thanks!

marcus1487 commented 5 years ago

This error indicates that the ground truth sites (mod_pos.bed and unmod_pos.bed) don't overlap the results found in the per-read stats file.

There are a large number of reasons this could happen (e.g. mismatched chromosome names, off-by-one coordinates). I would suggest first to visualize the ground truth bed files on the reference sequence FASTA along with the text output from the aggregated tombo stats from this per-read stats file in a genome browser. This may lead to the reason for the lack of overlap between these coordinates.

P.S. The --genome-fasta is not required (or used I think) when providing a set of ground truth locations. This argument is meant for the motif running mode for this command.