nanoporetech / tombo

Tombo is a suite of tools primarily for the identification of modified nucleotides from raw nanopore sequencing data.
Other
229 stars 54 forks source link

text output #306

Open AicuiZhang opened 3 years ago

AicuiZhang commented 3 years ago

Hi, dear developer!

When I ran "tombo detect_modifications model_sample_compare" and "tombo text_output browser_files", I met a question. My commands are as follows:

tombo detect_modifications model_sample_compare --fast5-basedirs ./treatment-fast5/ --control-fast5-basedirs ./control-fast5/ --statistics-file-basename all.model_compare_sample --per-read-statistics-basename perRead_all --sample-only-estimates --processes 12

and

tombo text_output browser_files --fast5-basedirs ./treatment-fast5/ --control-fast5-basedirs ./control-fast5/ --statistics-filename all.model_compare_sample.tombo.stats --file-types fraction dampened_fraction coverage valid_coverage signal signal_sd dwell difference --browser-file-basename ./all

And the situation is the signal is broken in all.fraction_modified_reads.plus.wig file, but complete in other files, such as all.signal.sample.plus.wig file.

There is an IGV shot 图片

Looking forward your reply! Thank you!

marcus1487 commented 3 years ago

This is most likely due to the coverage filter applied during the detect modifications step. Not at my computer right now but the command line help should identify the correct argument.

AicuiZhang commented 3 years ago

We noticed that the default value of RNA is 0.05-0.4, so we changed our command to

"tombo detect_modifications model_sample_compare --single-read-threshold 0.2 0.25 --fast5-basedirs /xtdisk/renjie_group/zhangaicui/coronavirus/The_fourth/SiMETTL1/20200918_0831_MN27920_FAO15302_e7cde67d/virus-fast5/ --control-fast5-basedirs /xtdisk/renjie_group/zhangaicui/coronavirus/The_fourth/SINC/20200918_0829_MN26079_FAO10756_0b8ab8c1/virus-fast5/ --statistics-file-basename /xtdisk/renjie_group/zhangaicui/coronavirus/The_fourth/compare-1208/all.model_compare_sample --per-read-statistics-basename /xtdisk/renjie_group/zhangaicui/coronavirus/The_fourth/compare-1208/perRead_all --sample-only-estimates --processes 12".

But the issue still existed. In addition, we just found an error information. 图片

I am not sure if undone "tombo detect_modifications model_sample_compare" will lead to that some text_output files are complete, some are not. And the all.valid_coverage.plus.wig file has a non-corresponding coverage with bam file acquired after aligning to genome. The all.dampened_fraction_modified_reads.plus.wig and all.fraction_modified_reads.plus.wig are incpmplete.

All above are what we find now. Hope they will be helpful to solve the problem.

Looking forward to your reply.

Thank you

marcus1487 commented 3 years ago

Setting --single-read-threshold to 0.2 0.25 still discards calls between 0.2 and 0.25 in aggregation to compute the fraction modified. This is the reason for the difference in values reported in the fraction modified wig. Setting this parameter to a single value will lead to inclusion of all reads. I would warn though that his may lead to many false positive values making the fraction estimates less accurate.

For the difference between the BAM mapping coverage and tombo reported coverage, tombo takes on a single mapping per-read (to avoid over-counting modified bases from the same read in multiple locations). This is most often the reason for these differences. Also the valid_coverage output only includes those reads included as defined by the --single-read-threshold parameter.

I hope this helps clear up any issues with the output files. If there are more issues, could you explain all commands entered, the expected output and the exact observed output.