Closed lizgehret closed 1 year ago
@nbokulich, do you want to review this one too, or should we merge when it's ready?
hey thanks for pinging. I am traveling this week and next so do not think I can review this in time for the next release! So please move ahead without me. Thanks!
I think it makes sense to provide group_column when _load_metadata is called in pairwise_differences as well (line 47). It looks like the filtering is handled elsewhere there (since that action worked for me) but this issue stemmed from inconsistent metadata handling between the two functions, so loading the metadata in the same way seems like it might avoid future issues with that. Just let me know if I'm missing a reason why we shouldn't do that though.
Thanks for bringing that up @gregcaporaso - this actually brings up an issue that I need to investigate a bit further, which is why pairwise-distances
was failing while pairwise-differences
wasn't (with the data you provided in the example above). They both call the same validation helper method _validate_input_values()
which is where pairwise-distances
fails. They should both exhibit the same behavior - but I'm probably missing something that is the source of this discrepancy. In any case, I'm going to investigate that further and will circle back once that's sorted out!
Okay circling back on this @gregcaporaso - I examined both methods in detail, and it looks like you were only missing that error message in pairwise-differences
because you were including multiple files as --m-metadata-file
arguments, which calls the merge_metadata
method, taking only the intersection between the two files (thus removing the IDs without any associated data). So this parameter in _load_metadata()
will be fine to add to both methods!
I'll work on writing up a few unit tests that validate behavior, and then we should be all set here. 🙂
Alright so I've added a single unit test that tests the _load_metadata()
helper function in isolation - this covers functionality for both pairwise_differences
and pairwise_distances
methods. We should be all set here, but let me know if there's anything else I've missed!
So final comment/additions to this PR for posterity - @gregcaporaso and I met this morning and discussed adding two additional unit tests that run pairwise-distances
and pairwise-differences
with metadata that contains nans in the group column that will fail if group_column
is ever removed from _load_metadata()
. Pending passing CI, we should be all set here!
Addresses #172