Closed LeilyR closed 4 years ago
1) The statistic
output type (from the tombo text_output browser_files
command) should not be allowed for a de novo statistics file. There is no p-value associated with an aggregated de novo statistics file. Statistics files generated from detection methods which produce per-read statistics are aggregated to produce fraction and coverage values. Level comparison methods on the other hand produce p-values and effect size statistics (depending on the type of test employed) which can be output via the statistics
output.
Are you sure that the statistics file was from a tombo detect_modifications de_novo
statistics file? Could you post the exact commands used to produce this statistics output file?
2) This is a question for your sample of interest. Many biological sample produce valid direct RNA reads mapping to the reverse strand. If you determine that reverse strand mapping reads are irrelevant for your research then they can indeed be ignored.
3) The signal
output is the mean over reads of the normalized signal values assigned to that base. See docs on this output here and the re-squiggle algorithm to assign signal to reference bases here.
Hi Marcus,
thanks a lot for the answers.
Tombo version: 1.5.1
:
tombo detect_modifications de_novo --fast5-basedirs fast5 --statistics-file-basename denovo --rna --fishers-method-context 2 --minimum-test-reads 1 --per-read-statistics-basename perRead_denovo --num-most-significant-stored 20000 --processes 30
tombo text_output browser_files --fast5-basedirs fast5/ --statistics-filename denovo.tombo.stats --genome-fasta transcripts.fa --file-types statistic
and the output is the one I have already sent you in my previous comment.
So, then if I understood it correctly there is no p values to check for the significance of denovo or alternative models, right? What would be your suggestion? I am getting quite a lot of modified bases with fraction = 1, I have seen you recommendation in using dampened factions but I am not sure if I could understand how to filter them for the most significant positions. Am I just using an arbitrary threshold and keep the bases of dampened fraction above that? Thanks a lot again! Cheers, Leily
Yes that is correct. There are no p-values to check for the per-reference site statistics files for de novo or alternative models. The p-values are computed on a per-read level, then these values are converted to binary modified or not values and a fraction is reported in the per-reference site statistics file.
I took a look and there was a bug in the way the browser_files
command checked the outputs. The statistic
output should not have been allowed given a de novo statistics file. What was output is the dampened_fraction
output. I have pushed a fix to the github repo with this logic corrected.
For the more central issue of high false positives in RNA, this is a known issue. RNA signal is quite a bit trickier to model with a k-mer model, leading to a high false positive rate (lots of fraction modified = 1) for RNA de novo detection. The level_sample_compare
method has shown good results for RNA modified base detection, but requires a control sample (e.g. IVT). We are working hard to provide alternative methods for modified RNA detection, but these will likely not be k-mer based due to these limitations.
Great! pretty helpful! would you still recommend setting a threshold on dampened fraction , or I better think of some integrative analysis (using another data/tool to confirm the modification such as motifs etc.) to filter some of those 1s from fraction file out?
I would recommend some integrative analyses with a mind toward the potential high false positive rate. In particular many of the sites may be systematic bias due to the fact that the k-mer model does not adequately capture the variation in RNA signal. For example an enriched motif may be a modified base at that motif or a systematic error at that k-mer in a particular sample. There may be some value in using the de novo model for direct RNA modified base analysis, but these conclusions should certainly be verified by other means.
Thanks a million!
Hi,
statistic
file type for a de novo detected modifications. I saw in the documentation you were using it forlevel_sample_compare
, but Could not fins an example for the de novo stat file. Am I right that the reported values in this wig file are the p values of the statistic test has been used for the de novo detection per location? If so, does it make sense to filter out the positions with p values bigger than let's say 0.05? for example these ones: 5 0.2500 6 0.4000 7 0.4000 8 0.3333 9 0.3333 10 0.2500 11 0.5000 12 0.5000signal
intext_output browser_files
. Thank you much!