sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
476 stars 79 forks source link

MRG: when lingroups are provided, use them for `csv_summary` #3311

Closed bluegenes closed 1 month ago

bluegenes commented 3 months ago

Currently, when we generate a csv_summary with LINs, we get a summary at every single LIN rank, which is a lot of results and not very helpful. LINgroups are our way of linking the LINs (e.g. 14;1;0;0;0;0;0;0;0;0) to a known name/taxonomic group (e.g. "Phylotype I").

This PR changes the behavior of csv_summary when a lingroup file is provided, limiting summarized reporting to just the named lingroups. While the output is very similar to the lingroup output we already have, the most important difference is that the sample name is included in the output, meaning that we get intelligible results when running tax metagenome on more than one sample.

Prior tax metagenome behavior was to always generate a lingroup output file when a lingroups file is provided. Here, I disable that for multiple queries, since the results wouldn't make sense. I do not replace it with another default, but I did add a recommendation to the help + doc.

In the future, we could consider changing the default lingroup output to csv_summary, since it's actually useful for multiple files. Or, we could modify the lingroup output to include query information.

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.40%. Comparing base (5acf698) to head (e3c3bb6). Report is 1 commits behind head on latest.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## latest #3311 +/- ## ========================================== + Coverage 86.45% 92.40% +5.94% ========================================== Files 137 104 -33 Lines 16070 12925 -3145 Branches 2211 2219 +8 ========================================== - Hits 13894 11943 -1951 + Misses 1869 675 -1194 Partials 307 307 ``` | [Flag](https://app.codecov.io/gh/sourmash-bio/sourmash/pull/3311/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio) | Coverage Δ | | |---|---|---| | [hypothesis-py](https://app.codecov.io/gh/sourmash-bio/sourmash/pull/3311/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio) | `25.43% <8.00%> (-0.04%)` | :arrow_down: | | [python](https://app.codecov.io/gh/sourmash-bio/sourmash/pull/3311/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio) | `92.40% <100.00%> (+<0.01%)` | :arrow_up: | | [rust](https://app.codecov.io/gh/sourmash-bio/sourmash/pull/3311/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio) | `?` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

bluegenes commented 2 months ago

@sourmash-bio/devs ready for review

ctb commented 2 months ago

oh! I wanted to suggest that you put the suggested changes to behavior in the PR description into new issues, too; I think they require a major version bump?

bluegenes commented 1 month ago

looks good - esp appreciate the documentation update.

there's some missing code coverage - is this just buggy codecov? I haven't dug in at all.

looks like it was just buggy codecov!

bluegenes commented 1 month ago

oh! I wanted to suggest that you put the suggested changes to behavior in the PR description into new issues, too; I think they require a major version bump?

now in #3361