Open hackkr opened 1 month ago
is it appropriate to use the same thresholds?
It is often a good idea to use the same thresholds for a larger experiment, especially when the estimated thresholds are so close like this. When applying the thresholds to new samples though it is important to note the fraction of calls filtered. When this value is large it can vastly skew downstream results and is often an indication of a poor quality run (due to contamination, very high mod content or other different run conditions).
Is there a way, similar to modkit validate to use another sample as a ground truth?
I'm not quite sure what you mean here. If you want to use the combination of these samples in order to estimate the thresholds you can do that by merging the bam files (potentially with some balancing). The modkit validate command is generally intended for application to strands where the modified status of each read in the sample will be absolutely known at a particular reference position (primarily from synthetically printed strands). The command will run on any file presented, but the results may not represent valid results.
would it be best to calculate position-specific thresholds?
We would generally advise against position specific thresholds. Users are free to explore this option (probably via the modkit extract
output), but position specific thresholds are not implemented in any modkit commands, so application of these would have to be handrolled by the user.
I hope this helps and please let me know if further clarification would help with your analysis.
Hello,
I ran a preliminary experiment to see if the all contexts methylation models work in an unusual system. Basically, I amplified the whole genome to remove any marks, then treated samples with either water, AluI methyltransferase (5mC), or EcoRI methyltransferase (6mA) and then rapid barcoded and sequenced. Samples were basecalled with
supv5.0.0
models.When I run modkit summary, all of my samples have similar levels of 6mA detected, and slightly different threshold values.
I have two questions: for samples that were sequenced together, is it appropriate to use the same thresholds? Is there a way, similar to
modkit validate
to use another sample as a ground truth?I plan to look at position-specific differential methylation next, but want to be sure I understand the rationale behind thresholds. Because there is a known motif for both samples relative to the negative control, would it be best to calculate position-specific thresholds?