nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
136 stars 7 forks source link

How to detect the background noise from control and test samples #172

Open xiangpingyu opened 5 months ago

xiangpingyu commented 5 months ago

Dear developers,

Attached are the results for two samples: 1# (control) and 2# (modified_test), generated from the modkit summary and modkit sample-probs using default parameters. I am uncertain about how to adjust the settings in modkit extract to accurately assess the modification status in the modified_test sample, particularly in the absence of an appropriate reference sequence for our experiments.

I have reviewed related issue #147, but still unclear on how to set the --filter-threshold and --mod-threshold parameters effectively, based on the attached files. If there is a way to establish a threshold for background noise in these two samples? Looking forward to your reply. Thank you all!

Sophia

1.summary.csv 2.summary.csv 2_probabilities.txt 1_probabilities.txt 2_thresholds.csv 1_thresholds.csv

ArtRand commented 5 months ago

Hello @xiangpingyu,

In general, you don't have to do anything to adjust the thresholds. The estimated threshold values in both of your samples seem to be about the same. If you want to estimate the false positive rate, use a sample where you know there should not be any 6mA bases. For example, a PCR amplified sample or a genome where you know the organism does not have the enzymes necessary to make this modification. When you use a sample such as this, you know that all 6mA calls must be false positive calls. You can use modkit pileup or modkit summary to aggregate these data (use the --only-mapped flag if you decide to use modkit summary). Optionally, you could use modkit sample-probs on a sample where you know there is 6mA at a reasonable level and get the default threshold value from that sample. Then use that value when you calculate the false positive rate as I've just described. I would do it both ways as a sanity check, because I would not expect the results to be very different. Hope this helps.

A

xiangpingyu commented 5 months ago

Hello @ArtRand ,

Due to lacking a suitable reference in our libraries, so that we have methylation bam files that are not aligned to a specific reference. I've found that using "modkit pileup" or "modkit summary to aggregate these data" (with the --only-mapped flag for modkit summary) does not facilitate further analysis of these unaligned bam files.

Could there be something I'm misunderstanding?

I look forward to your assistance.

Thank you!