BUG: Exposes 'ignore_missing_samples' parameter in core metrics from emperor

hagenjp commented 10 months ago

@lizgehret just waiting on CI but this is ready for your review

lizgehret commented 10 months ago

Hey @hagenjp - couple of things came up while I was testing. Quick overview below, but let's meet on Monday to work on this together!

When running core-metrics using the steps listed in the original issue, the following error is raised:
When running core-metrics-phylogenetic, the ignore_missing_samples parameter is not recognized.

Offhand, I'm not sure why 1. is happening - but I know why 2. is happening. These will be good things to pair on, and this is a great example of why we always test locally before merging - all of the existing tests pass, but this functionality isn't working like we want it to 🙃

lizgehret commented 10 months ago

Okay so for posterity - @hagenjp and I met to discuss the index cannot be a set error shown in the screenshot above.

The issue seems to be coming from the _validate_metadata helper step within emperor, specifically when index is assigned to the difference variable, which is assumed to be a set. pandas >=1.5 does not allow sets to be used in indices or columns, since they are not 'array-like' - which was a breaking change.

We tested that the error was related to this change in pandas 1.5 by creating a QIIME 2 core 2022.8 environment (our latest release prior to upgrading to pandas 1.5). One side note is that we had to set the conda --solver flag to classic in order to utilize our 2022.8 environment file (the libmamba solver couldn't solve the environment). That's a separate issue to discuss, but adding here for the record.

Once in a 2022.8 environment, we created a local branch for q2-diversity based off of the 2022.8 release commit and then cherry-picked @hagenjp's commits from this PR to ensure that we were working in a 2022.8 version of q2-diversity that also included the changes from this PR. We then re-ran core-metrics with the steps described in the associated issue, which were successful.

Next steps are to determine if it's possible to get around this issue in q2-emperor, or if we need to hold off on this PR until the necessary changes are made in emperor (if this is the case, we'll open an issue there).

@hagenjp I'd recommend reviewing everything we did above, and seeing if you can replicate all of this on your end without me. This is a great learning experience with traceback reading, conda, and git 🙂

lizgehret commented 10 months ago

Okay so final update on this:

We cannot workaround this in q2-emperor, and the culprit lies here within emperor. However, a fix has been rolled out here so that the difference object is converted to a list before being passed into the index param. These changes won't be available until the next version of emperor is released, which looks like (from this commit) is slated for July 2024. So we'll have to wait until we can upgrade to this version of emperor before this PR will pass.

We'll leave this open and marked as blocked until we upgrade to the next version of emperor.

ElDeveloper commented 10 months ago

@lizgehret thanks for tracking this down. I just noticed I never pushed a release to GitHub when I published 1.0.4 to PyPI (oops!). This is fixed, and I've pushed a pull request to q2-emperor:

https://github.com/qiime2/q2-emperor/pull/108

lizgehret commented 10 months ago

@lizgehret thanks for tracking this down. I just noticed I never pushed a release to GitHub when I published 1.0.4 to PyPI (oops!). This is fixed, and I've pushed a pull request to q2-emperor:

qiime2/q2-emperor#108

Thanks @ElDeveloper! We'll hold off merging that PR until emperor 1.0.4 is available on conda - you're expecting that to be sometime in July of this year?

ElDeveloper commented 10 months ago

@lizgehret Better than that, 1.0.4 has been available via conda-forge for the past 6 months: https://anaconda.org/conda-forge/emperor/files

🎉 🐧

lizgehret commented 10 months ago

@ElDeveloper oh wow, that didn't come up when I searched for available emperor builds on anaconda - in any case, this is great! Thanks! I'll get this PR merged 🙂

lizgehret commented 10 months ago

EDIT: Okay so additional follow up re: emperor 1.0.4 - I created a 2023.9 amplicon environment with @ElDeveloper's PR branch installed locally (that updates emperor within q2-emperor's recipe to 1.0.4). Conda can solve the environment, so this should be compatible with the amplicon distro. q2-emperor's tests all pass locally w/this version, ~~but we have 3 test failures in q2-diversity to examine (the only other plugin w/an indirect dependency on emperor via q2-emperor)~~ and all tests pass in q2-diversity as well. The functionality in this PR works as expected with the updated version of emperor. ~~So once the test failures in q2-diversity are sorted,~~ We should be good to merge the version pin update PR as well as this one!

lizgehret commented 10 months ago

Okay @hagenjp, I've gone ahead and merged the version pin update PR in q2-emperor. So remaining to-do's here are:

[x] When running core-metrics-phylogenetic, the ignore_missing_samples parameter is not recognized. This parameter is missing somewhere within the core-metrics-phylogenetic pipeline - take a look in there and add that where it's needed 🙂
[x] Add tests for this new exposed parameter - we'll want to assert that we get a failure in core-metrics and core-metrics-phylogenetic for the original use-case, and that enabling this flag will resolve that failure in each case.

Make sure to pull down and install the latest version of q2-emperor with the updated pin in your development environment, and conda install -c conda-forge emperor=1.0.4 in that same environment. You should then be able to effectively test all of this locally! Lmk if you have any questions 🙂

hagenjp commented 10 months ago

@lizgehret If this passes after the build on Sunday it will be ready for your review! Thanks :)

qiime2 / q2-diversity

BUG: Exposes 'ignore_missing_samples' parameter in core metrics from emperor #348