qiime2 / q2-longitudinal

QIIME 2 plugin for paired sample comparisons
BSD 3-Clause "New" or "Revised" License
9 stars 18 forks source link

bug: `pairwise-distances` fails when samples without group are in metadata but not distance matrix #174

Closed lizgehret closed 1 year ago

lizgehret commented 1 year ago

Addresses #172

gregcaporaso commented 1 year ago

@nbokulich, do you want to review this one too, or should we merge when it's ready?

nbokulich commented 1 year ago

hey thanks for pinging. I am traveling this week and next so do not think I can review this in time for the next release! So please move ahead without me. Thanks!

lizgehret commented 1 year ago

I think it makes sense to provide group_column when _load_metadata is called in pairwise_differences as well (line 47). It looks like the filtering is handled elsewhere there (since that action worked for me) but this issue stemmed from inconsistent metadata handling between the two functions, so loading the metadata in the same way seems like it might avoid future issues with that. Just let me know if I'm missing a reason why we shouldn't do that though.

Thanks for bringing that up @gregcaporaso - this actually brings up an issue that I need to investigate a bit further, which is why pairwise-distances was failing while pairwise-differences wasn't (with the data you provided in the example above). They both call the same validation helper method _validate_input_values() which is where pairwise-distances fails. They should both exhibit the same behavior - but I'm probably missing something that is the source of this discrepancy. In any case, I'm going to investigate that further and will circle back once that's sorted out!

lizgehret commented 1 year ago

Okay circling back on this @gregcaporaso - I examined both methods in detail, and it looks like you were only missing that error message in pairwise-differences because you were including multiple files as --m-metadata-file arguments, which calls the merge_metadata method, taking only the intersection between the two files (thus removing the IDs without any associated data). So this parameter in _load_metadata() will be fine to add to both methods!

I'll work on writing up a few unit tests that validate behavior, and then we should be all set here. 🙂

lizgehret commented 1 year ago

Alright so I've added a single unit test that tests the _load_metadata() helper function in isolation - this covers functionality for both pairwise_differences and pairwise_distances methods. We should be all set here, but let me know if there's anything else I've missed!

lizgehret commented 1 year ago

So final comment/additions to this PR for posterity - @gregcaporaso and I met this morning and discussed adding two additional unit tests that run pairwise-distances and pairwise-differences with metadata that contains nans in the group column that will fail if group_column is ever removed from _load_metadata(). Pending passing CI, we should be all set here!