qiime2 / q2-diversity

BSD 3-Clause "New" or "Revised" License
4 stars 45 forks source link

Add tests catching table data mutations in pipelines and alpha-rarefaction #285

Closed ChrisKeefe closed 4 years ago

ChrisKeefe commented 4 years ago

Improvement Description Add unit/reversion tests to diversity pipelines and "informal pipelines" like alpha-rarefaction capable of detecting unintended mutations of table data.

Current Behavior biom.Table.pa() and biom.Table.norm() mutate data by default, and require setting inplace=False to prevent mutation. Overlooking this parameter once in q2_diversity_lib nearly caused a data integrity bug in core-metrics and core-metrics-phylogenetic in 2020.6, in which q2_diversity_lib.observed_features would have mutated table data used in calculating other diversity measures downstream.

This bug was prevented by QIIME 2's general resistance to incidental data mutation :sweat_smile:. However, "informal" pipelines like alpha-rarefaction, which import and run vanilla python functions without using ctx are still susceptible and should test for misbehavior. Augmenting the core-metrics testing to catch reversions might also be useful.

Proposed Behavior Tests on alpha-rarefaction and possibly core-metrics that catch table mutations with at least the default measures. If there's a way of checking all measures for this without significantly increasing test times, that would be great.

Questions

  1. Is there a way of testing this that provides adequate safety without brute-force permutation testing of all available measures?
ChrisKeefe commented 4 years ago

This issue is being closed. sdk.Context() appears to make it safe to inject context within Visualizers (e.g. q2_diversity.alpha_rarefaction), alleviating the concern. It's still possible to break things if a dev creates an "informal pipeline" by importing vanilla python functions, but this approach is neither recommended nor supported.

If this kind of test becomes necessary at any point in the future, a brute-force test isn't as bad as expected: