Closed ChrisKeefe closed 4 years ago
This tests clean locally, but tests will continue to fail here until https://github.com/qiime2/q2-diversity-lib/pull/21 is merged.
This PR is ready for a preliminary review - does anything here need major course correction?
Only beta_phylogenetic
has been converted into a pipeline at this time. I'll continue transitioning other Actions on a separate branch while this is in review, so this doesn't become a moving target.
Edit:
After beginning work on dispatch for nonphylogenetic beta diversity measures, I've decided to move dispatch from diversity-lib
to diversity
, preventing some gross circular dependency action. Removed this from In Review until that's wrapped up.
Cancel that. After a light informal review and discussion with @thermokarst , I'm going to carry on porting all diversity calculations over to diversity-lib
, and handling dispatch there.
Thanks @ChrisKeefe, these q2-diversity/q2-diversity-lib changes are looking great! After spending some time going through these PRs, I have a handful of thoughts, which I will try to outline here.
TLDR: I think there is at least one too many levels of indirection here, and with a little restructuring this code can be simplified conceptually and structurally.
The role of the q2-diversity-lib plugin is to serve as library code for q2-diversity. The q2-diversity-lib code could just as easily live inside of q2-diversity itself, but we chose to factor it out as a plugin for a few reasons, one of them being just for the sake of giving you an opportunity to make a plugin from scratch. With that in mind, I think it is important to emphasize that the use of this q2-diversity-lib library code does not need to be minimized in q2-diversity, in fact, I think that minimizing/masking q2-diversity-lib invocations in q2-diversity is problematic.
If we emphasize that the actions available in q2-diversity-lib are meant to compose together into bigger pipelines in q2-diversity, we get the best of both worlds - compartmentalization of code (in q2-diversity-lib) and easy to follow pipelines (in q2-diversity).
With that in mind, generally speaking:
The proliferation of data-lookups-as-utility-functions-but-only-sometimes is a bit confusing:
alpha.implemented_phylogenetic_measures_dict()
alpha.implemented_phylogenetic_measures()
alpha.all_phylogenetic_measures_alpha
(note not a function)alpha.implemented_nonphylogenetic_measures_dict()
alpha.implemented_nonphylogenetic_measures()
alpha.unimplemented_nonphylogenetic_measures()
alpha.all_nonphylogenetic_measures_alpha()
beta.implemented_nonphylogenetic_metrics_dict()
beta.implemented_nonphylogenetic_metrics()
beta.unimplemented_nonphylogenetic_metrics()
beta.all_nonphylogenetic_measures_beta()
beta.implemented_phylogenetic_metrics_dict()
beta.unimplemented_phylogenetic_metrics_dict()
beta.all_phylogenetic_measures_dict()
beta.all_phylogenetic_measures_beta()
Instead of this, I propose one coherent data structure that encodes all of this information in one pass:
MAPPINGS = {
'PHYLO': {
'IMPL': {'faith_pd'},
'UNIMPL': set(),
},
'NONPHYLO': {
'IMPL': {'observed_features', 'pielou_e', 'shannon'},
'UNIMPL': {'ace', 'chao1', 'chao1_ci', 'berger_parker_d',
'brillouin_d', 'dominance', 'doubles', 'enspie', 'esty_ci',
'fisher_alpha', 'goods_coverage', 'heip_e',
'kempton_taylor_q', 'margalef', 'mcintosh_d', 'mcintosh_e',
'menhinick', 'michaelis_menten_fit', 'osd', 'robbins',
'simpson', 'simpson_e', 'singles', 'strong', 'gini_index',
'lladser_pe', 'lladser_ci'},
},
}
This lends itself well to "natural reading", especially when chaining together:
metrics = MAPPINGS['NONPHYLO']['IMPL'] | MAPPINGS['NONPHYLO']['UNIMPL']
implemented_metrics = MAPPINGS['NONPHYLO']['IMPL']
Also, this has the advantage of only exporting one object, rather than a half dozen loosely related ones.
You also might've noticed that there aren't dictionaries in the tips, I don't think you'll need to include those dicts, and hopefully you'll see why, elsewhere in this review. Right now I am envisioning one MAPPINGS
in the alpha
module, and one in beta
.
Also, I'm open to other names (METRICS
, for example), and different layouts of the data, this is just to get you started with some ideas.
I like the idea of including translations, and I mention it here as a subsection of mappings, but only because I see it as a loosely related idea. I think you will be able to move any translation functionality into the q2-diversity Pipelines, discussed later in this review.
I think the following QIIME 2 Methods should be exposed in this plugin (will reference the path to the view function, for clarity):
alpha.faith_pd
alpha.observed_features
alpha.pielou_evenness
alpha.shannon_entropy
alpha.alpha_passthrough
beta.bray_curtis
beta.jaccard
beta.unweighted_unifrac
beta.weighted_unifrac
beta.beta_passthrough
beta.beta_phylogenetic_passthrough
You'll notice there is no alpha_phylogenetic_passthrough
Method, only because there are no unimplemented alpha phylogenetic metrics in q2-diversity.
alpha_passthrough
This method will handle just the MAPPINGS['NONPHYLO']['UNIMPL']
metrics, and might look something like this:
@_disallow_empty_tables
def alpha_passthrough(table: biom.Table, metric: str) -> pd.Series:
counts = table.matrix_data.toarray().astype(int).T
sample_ids = table.ids(axis='sample')
result = skbio.diversity.alpha_diversity(metric=metric, counts=counts,
ids=sample_ids)
result.name = metric
return result
beta_passthrough
This method will handle just the MAPPINGS['NONPHYLO']['UNIMPL']
metrics.
Same idea as alpha_passthrough
above - defer to skbio, don't worry about fancy signatures, citations, etc. Basically just moving the beta
method over from q2-diversity.
beta_phylogenetic_passthrough
This method will handle just the MAPPINGS['PHYLO']['UNIMPL']
metrics.
Same idea as alpha_passthrough
above - defer to skbio, don't worry about fancy signatures, citations, etc. Basically just moving the beta_phylogenetic
method over from q2-diversity.
There will be none in this plugin. That means removing:
alpha.alpha_dispatch
alpha.alpha_phylogenetic_dispatch
beta.beta_dispatch
beta.beta_phylogenetic_dispatch
The bodies of these pipelines will be moved to q2-diversity.
There will be none in this plugin. That means removing:
alpha.alpha_rarefaction_dispatch
alpha.alpha_rarefaction_phylogenetic_dispatch
beta.skbio_dispatch
beta.unifrac_beta_dispatch
There will be none in this plugin.
Several existing methods will be converted to Pipelines (see below).
The following Methods will be removed:
_alpha.alpha
_alpha.alpha_phylogenetic
_alpha.alpha_phylogenetic_alt
_alpha.alpha_phylogenetic_old
_beta.beta
_beta.beta_phylogenetic
New pipelines (former Methods):
_alpha.alpha
_alpha.alpha_phylogenetic
_beta.beta
_beta.beta_phylogenetic
These pipelines can all basically have the same implementations as the *_dispatch
pipelines that you added to q2-diversity-lib.
alpha
This is more-or-less a copy-paste from your current alpha_dispatch
pipeline:
def alpha(ctx, table, metric, drop_undefined_samples=None):
metrics = MAPPINGS['NONPHYLO']['IMPL'] | MAPPINGS['NONPHYLO']['UNIMPL']
implemented_metrics = MAPPINGS['NONPHYLO']['IMPL']
if metric not in metrics:
raise ValueError("Unknown metric: %s" % metric)
metric = ALPHA_TRANSLATION[metric]
if metric in implemented_metrics:
func = ctx.get_action('diversity_lib', metric)
if 'drop_undefined_samples' in signature(func).parameters:
func = partial(func, table=table,
drop_undefined_samples=drop_undefined_samples)
else:
if drop_undefined_samples:
warnings.warn(f"The {metric} metric does not support dropping "
"undefined samples.")
func = partial(func, table=table)
else:
func = ctx.get_action('diversity_lib', 'alpha_passthrough')
func = partial(func, table=table, metric=metric)
result = func()
return tuple(result)
The following Pipelines shall be used as QIIME 2 Pipelines, as well as vanilla Python code:
_alpha.alpha
_alpha.alpha_phylogenetic
_beta.beta
_beta.beta_phylogenetic
This can be handled by building a dependency-injection mock, to fill in the ctx
argument:
class PyContext:
def __init__(self):
self.mappings = {
'diversity_lib': {
'faith_pd': q2_diversity_lib.alpha.faith_pd,
'observed_features': q2_diversity_lib.alpha.observed_features,
'pielou_e': q2_diversity_lib.alpha.pielou_evenness,
'shannon': q2_diversity_lib.alpha.shannon_entropy,
'alpha_passthrough': q2_diversity_lib.alpha.alpha_passthrough,
'bray_curtis': q2_diversity_lib.beta.bray_curtis,
'jaccard': q2_diversity_lib.beta.jaccard,
# ... etc etc etc
},
}
def get_action(self, plugin, action):
return self.mappings[plugin][action]
The following visualizers will need to be updated to invoke the "Pipelines as Python Functions" listed above:
_alpha.alpha_rarefaction
_beta.beta_rarefaction
Using the PyContext
class above, we can "inject" the vanilla Python functions in:
...
ctx = PyContext()
for d, i in itertools.product(depth_range, iter_range):
rt = rarefy(feature_table, d)
for m in metrics:
if m in phylogenetic_metrics():
rt_p = transform(rt, to_type=BIOMV210Format,
from_type=biom.Table)
vector = alpha_phylogenetic(
ctx=ctx, table=rt_p, metric=m, phylogeny=phylogeny)
else:
vector = alpha(ctx=ctx, table=rt, metric=m,
drop_undefined_samples=False)
data[m][(d, i)] = vector
return data
...
Like I said above, this all looks great, and I think the main parts of the implementation are there, but I think we can do a lot to help streamline it and reduce the complexity of the Pipeline -> Pipeline -> Method workflow that is currently proposed. Let me know if you have any questions or comments. I'm sure I missed one or two things in here, but I have played with all of these proposed changes locally and can confirm that the plan proposed here works.
🦖
Going to merge now - travis is failing because the staging env is missing the latest div-lib. I tested locally and all seems in order, no need to wait 2-3 hrs for the new env.
Closes #280. See also qiime2/q2-diversity-lib#21
Closes #284.
Note: Renames "weighted_unifrac" to "weighted_unnormalized_unifrac", aligningq2-diversity
withq2-diversity-lib
and the underlyingunifrac
package. This is a breaking API change.