nanoporetech / tombo

Tombo is a suite of tools primarily for the identification of modified nucleotides from raw nanopore sequencing data.
Other
230 stars 54 forks source link

bed file format for modified/unmodified locations #153

Closed JiahuiWangGit closed 5 years ago

JiahuiWangGit commented 5 years ago

Hi,

I was using plot roc function to plot AUC:

(tombo_36) [wangj@helix tombo]$ tombo plot roc --statistics-filenames mathylation_testing.5mC.tombo.stats --genome-fasta $genome_fasta **** ERROR **** Must provide either motifs or bed files describing ground truth modification locations.

After seeing this error I then prepared the modified location and unmodified location bed file: (tombo_36) [wangj@helix tombo]$ head *.bed ==> bisulfite.ENCFF835NTC.example.mod.bed <== chr20 5001719 5001720 . 1 + 5001719 5001720 255,0,0 1 100 chr20 5002520 5002521 . 28 + 5002520 5002521 255,0,0 28 100 chr20 5002788 5002789 . 23 + 5002788 5002789 255,0,0 23 100 chr20 5002912 5002913 . 21 + 5002912 5002913 255,0,0 21 100 chr20 5003068 5003069 . 14 - 5003068 5003069 255,0,0 14 100

==> bisulfite.ENCFF835NTC.example.unmod.bed <== chr20 5000225 5000226 . 14 - 5000225 5000226 0,255,0 14 0 chr20 5000237 5000238 . 15 - 5000237 5000238 0,255,0 15 0 chr20 5000268 5000269 . 19 + 5000268 5000269 0,255,0 19 0 chr20 5000269 5000270 . 16 - 5000269 5000270 0,255,0 16 0 chr20 5000361 5000362 . 13 + 5000361 5000362 0,255,0 13 0

Then I got this error: (tombo_36) [wangj@helix tombo]$ tombo plot roc --statistics-filenames \

mathylation_testing.5mC.tombo.stats \
--modified-locations bisulfite.ENCFF835NTC.example.mod.bed \
--unmodified-locations bisulfite.ENCFF835NTC.example.unmod.bed \
--genome-fasta $genome_fasta

Traceback (most recent call last): File "/projects/li-lab/software/miniconda3/envs/tombo_36/bin/tombo", line 11, in sys.exit(main()) File "/projects/li-lab/software/miniconda3/envs/tombo_36/lib/python3.6/site-packages/tombo/main.py", line 279, in main _plot_commands.plot_main(args) File "/projects/li-lab/software/miniconda3/envs/tombo_36/lib/python3.6/site-packages/tombo/_plot_commands.py", line 2344, in plot_main plot_roc(**kwargs) File "/projects/li-lab/software/miniconda3/envs/tombo_36/lib/python3.6/site-packages/tombo/_plot_commands.py", line 99, in plot_roc mod_name, mod_fn = mod_name_fn.split(':') ValueError: not enough values to unpack (expected 2, got 1)

I am wondering if this error was caused by incorrect bed file format?

Thanks, Jiahui

marcus1487 commented 5 years ago

This error is not due to the bed format, but the command line arguments. The ground truth modified location file is associated with a name in order to support plotting potentially many different samples or modification types on a single plotting command. From the command line help (tombo plot roc -h) the --modified-locations option should be formatted as "mod_name:locs.bed".

So in this case your command could be:

tombo plot roc --statistics-filenames \
  mathylation_testing.5mC.tombo.stats \
  --modified-locations "Bisulfite Ground Truth":bisulfite.ENCFF835NTC.example.mod.bed \
  --unmodified-locations bisulfite.ENCFF835NTC.example.unmod.bed \
  --genome-fasta $genome_fasta
JiahuiWangGit commented 5 years ago

Great! It works! Thanks a lot.