Question about the available test methods and the input argument

tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data

https://nanocompore.rna.rocks

GNU General Public License v3.0

78 stars 12 forks source link

Question about the available test methods and the input argument #130

Closed mmiladi closed 4 years ago

mmiladi commented 4 years ago

Hi,

I have data for two conditions with only one replicate for each, using sampcomp. Which test would you recommend for the comparison? Also based on your experience, which of the metrics could provide more reliable results? And as the last question, would you recommend to change the default settings for identifying the modified sites? specifically any of the Statistical testing options arguments (--comparison_methods, --sequence_context, --sequence_context_weights and --logit).

Thanks and best, Milad

a-slide commented 4 years ago

Hi @mmiladi, A single replicate will not give you reliable results. The data is very noisy, and replicating your experiment is the best way to get rid of a lot of this variance. knowing that if you wish to proceed anyway, in our hands the best method for m6A is the GMM logit with a sequence context of 2 (--logit --sequence_context 2). Now, I don't know which mod you are interested in, and it could be that other settings will be better for you. It is hard to know in advance to be honest.

Best Ad

mmiladi commented 4 years ago

Hi @a-slide ,

Thanks for the feedback. I'll try to follow in line of the suggested parameters, specially my current sequence-context of zero should be important to be changed. I am not targeting a specific methylation at this phase and would like to identify potential sites of any modification.

Regarding the p-value metrics, is there a (strong) dependency to the read coverage depth? By looking into the p-values (GMM_logit, KS_dwell and KS_intensity) alongside the coverage graph in my data, it "feels" like that high p-values are reported once there is a drop in the IVT coverage. Of course it could be a true signal at the transcript 3'/5'-ends or a systematic bias of the ONT data.

Best, M

a-slide commented 4 years ago

Yes, there is a dependency to read coverage particularly for the KS methods but the GMM method is doing a better job. We benchmark Nanocompore against other tools including Tombo, and nanocompore has (by far) a much better control of pvalue inflation. We have been thinking of "correcting" pValues by the read coverage, but it feels wonky and we have not find a proper statistical method to do it so far. Any suggestions ? As a compromise, we might include a peak calling method in the next release of Nanocompore to de-noise and refine the positions of mods (see https://github.com/tleonardi/nanocompore/issues/95)