Closed billytcl closed 2 months ago
There are a number of factors which can effect the probabilities output by the Remora model. These include the overall modified base context (including modified bases in close proximity; within 10 bases of one another). Additionally there may be some run to run variability contributing to the output probabilities. I would suggest that normalizing these may not be advisable. The model is fundamentally outputting a lower confidence at the calls which is likely meaningful. There may be settings where normalization of these output probabilities can be beneficial, but I would try to avoid this for most generic analyses.
We are certainly aiming to have these probabilities constrained to a more consistent distribution both with modeling and increased consistency on the platform. I hope this helps, but please post more details if you have particular downstream analyses which require that these probabilities be normalized.
Looking at a few cfDNA samples across different runs, we've noticed instances where the meth_qual distribution can vary widely quite a bit. Eg. some samples are strongly piled up near 0 or 255, and others less so.
Since it's all cfDNA and all using the sample prep, I was wondering if there are nuances in the way remora calls methylation that we should consider or whether there is a way to bioinformatically batch correct/normalize for these differences?