Open mmiladi opened 4 years ago
Hi @mmiladi ,
At a high level Tombo does not provide a set of best practices and metrics as there are so many possible scenarios for modified base detection, including but not limited to, modified base frequency, modified base type, combinations of modified bases, reference sequence complexity, reference sequence quality, read coverage, target precision/recall, etc. There is no way to provide best practices for all scenarios. Tombo is instead meant to allow easier access to testing out different methods and metrics for modified base detection. The use cases provided work reasonably well, but certainly may not be the best method even for the use cases presented.
To specifically address the question, the available metrics section is correct and dampened_fraction
is available for the model_sample_compare
method (this is really just de novo where the canonical model is estimated at least in part from user provided data).
For the recommended metrics section, I would add that dampened_fraction
is generally recommended as it provides more robust metrics for a number of situations. One particular example case here is variable coverage in a generally high coverage sample producing false positives at the lower coverage sites. This recommendation applies to the model_sample_compare
method as well. For the level_sample_compare
method, I have found the best results from the effect size statistics with higher coverage samples, but have made no attempt to test/validate all combinations of metrics available from this method.
I hope this is helpful and please do post if there are specific follow up questions here.
Hi @marcus1487 ,
Thanks much for the comprehensive description. I add below the updated list based on your feedback.
Available metrics:
de_novo
: fraction
, dampened_fraction
alternative_model
: fraction
, dampened_fraction
model_sample_compare
: fraction
, dampened_fraction
level_sample_compare
: statistic
effect-size & p-value Recommended metric:
de_novo
: fraction
, dampened_fraction
. alternative_model
: fraction
, dampened_fraction
model_sample_compare
: fraction
, dampened_fraction
level_sample_compare
: statistic
effect-sizeThe dampened_fraction
is especially recommended to avoid false predictions due to the coverage biases (low coverage, coverage bumps*).
A general set of metrics to consider:
I have one other question regarding the thresholds for the effect-size. The concept is less commonly understood than the p-value. What is a typical threshold that users consider in the context of Tombo results?
This is a tricky for many of the same reasons listed above. The effect size metrics also have different ranges and behaviors depending upon the test computed. The effect size metrics are explained here in the docs. I would generally recommend that this value not be used blindly with a fixed threshold across different settings. These values are generally best either used in some fashion as a rank list or score for further investigation. I realize this leaves a bit up to the user, but there really isn't a great one-size-fits-all way to give reliable advice here.
Hi,
The diversity of available detection modes and the
text_out
types, makes it a bit difficult to grasp an overview of the recommended modification detection metrics for each mode. I have tried to infer it from the documentation and the use-cases. Still, it would be great if you could mention which of the metricsfraction
,dampened_fraction
,statistics
would be the recommended and available metrics for each modes. Below is my current understanding of the available and recommended metrics by reading the documentation. Could you please help with revising it?Available metrics:
de_novo
:fraction
,dampened_fraction
alternative_model
:fraction
,dampened_fraction
model_sample_compare
:fraction
,dampened_fraction
(?)level_sample_compare
:statistic
effect-size,statistic
p-valueRecommended metric:
de_novo
:fraction
,dampened_fraction
if the coverage is lowalternative_model
:fraction
,dampened_fraction
if the coverage is lowmodel_sample_compare
: (?)level_sample_compare
:statistic
(?)