Help for using the proper metric for each mode

mmiladi commented 4 years ago

Hi,

The diversity of available detection modes and the text_out types, makes it a bit difficult to grasp an overview of the recommended modification detection metrics for each mode. I have tried to infer it from the documentation and the use-cases. Still, it would be great if you could mention which of the metrics fraction, dampened_fraction, statistics would be the recommended and available metrics for each modes. Below is my current understanding of the available and recommended metrics by reading the documentation. Could you please help with revising it?

Available metrics:

de_novo: fraction, dampened_fraction
alternative_model: fraction, dampened_fraction
model_sample_compare: fraction, dampened_fraction (?)
level_sample_compare: statistic effect-size, statistic p-value

Recommended metric:

de_novo: fraction, dampened_fraction if the coverage is low
alternative_model: fraction, dampened_fraction if the coverage is low
model_sample_compare: (?)
level_sample_compare: statistic (?)

marcus1487 commented 4 years ago

Hi @mmiladi ,

At a high level Tombo does not provide a set of best practices and metrics as there are so many possible scenarios for modified base detection, including but not limited to, modified base frequency, modified base type, combinations of modified bases, reference sequence complexity, reference sequence quality, read coverage, target precision/recall, etc. There is no way to provide best practices for all scenarios. Tombo is instead meant to allow easier access to testing out different methods and metrics for modified base detection. The use cases provided work reasonably well, but certainly may not be the best method even for the use cases presented.

To specifically address the question, the available metrics section is correct and dampened_fraction is available for the model_sample_compare method (this is really just de novo where the canonical model is estimated at least in part from user provided data).

For the recommended metrics section, I would add that dampened_fraction is generally recommended as it provides more robust metrics for a number of situations. One particular example case here is variable coverage in a generally high coverage sample producing false positives at the lower coverage sites. This recommendation applies to the model_sample_compare method as well. For the level_sample_compare method, I have found the best results from the effect size statistics with higher coverage samples, but have made no attempt to test/validate all combinations of metrics available from this method.

I hope this is helpful and please do post if there are specific follow up questions here.

mmiladi commented 4 years ago

Hi @marcus1487 ,

Thanks much for the comprehensive description. I add below the updated list based on your feedback.

Available metrics:

de_novo: fraction, dampened_fraction
alternative_model: fraction, dampened_fraction
model_sample_compare: fraction, dampened_fraction
level_sample_compare: statistic effect-size & p-value

Recommended metric:

de_novo: fraction, dampened_fraction.
alternative_model: fraction, dampened_fraction
model_sample_compare: fraction, dampened_fraction
level_sample_compare: statistic effect-size

The dampened_fraction is especially recommended to avoid false predictions due to the coverage biases (low coverage, coverage bumps*).

A general set of metrics to consider:

modified base frequency, modified base type, combinations of modified bases, reference sequence complexity, reference sequence quality, read coverage, target precision/recall, ...

mmiladi commented 4 years ago

I have one other question regarding the thresholds for the effect-size. The concept is less commonly understood than the p-value. What is a typical threshold that users consider in the context of Tombo results?

marcus1487 commented 4 years ago

This is a tricky for many of the same reasons listed above. The effect size metrics also have different ranges and behaviors depending upon the test computed. The effect size metrics are explained here in the docs. I would generally recommend that this value not be used blindly with a fixed threshold across different settings. These values are generally best either used in some fashion as a rank list or score for further investigation. I realize this leaves a bit up to the user, but there really isn't a great one-size-fits-all way to give reliable advice here.

nanoporetech / tombo

Help for using the proper metric for each mode #266