taxprofiler / taxpasta

TAXnomic Profile Aggregation and STAndardisation
https://taxpasta.readthedocs.io/
Apache License 2.0
34 stars 7 forks source link

Add metaphlan taxonomic profiler application #21

Closed maxibor closed 2 years ago

maxibor commented 2 years ago

Since it's my first attempt at Pandera, I'm not sure I got it 100% right @Midnighter

Metaphlan reports by default relative abundances, which should sum up to 100%, but at each taxonomic rank (hence the groupby).

Midnighter commented 2 years ago

Great idea with the groupby. Keep in mind, though, that the rank names will depend on the taxonomy of the database used. So I would not rely on these particular names. Maybe you can exclude "unclassified" and "root" (I think those are fairly universal) and then group on all other entries present in the column?

maxibor commented 2 years ago

Great idea with the groupby. Keep in mind, though, that the rank names will depend on the taxonomy of the database used. So I would not rely on these particular names. Maybe you can exclude "unclassified" and "root" (I think those are fairly universal) and then group on all other entries present in the column?

These are actually harcoded in Metaphlan, see: https://github.com/biobakery/MetaPhlAn/blob/a857bccdac6bd2b26997e25bb9692d79736c1ed8/metaphlan/utils/add_metadata_tree.py#L71-75

maxibor commented 2 years ago

EDIT: Ignore this, just saw you setup the test with Tox ;) @Midnighter I've added an integration test, but I'm not sure I understand why the tests are failing on the GH CI :(

They pass fine (for metaphlan) on my machine:

(taxpasta_dev) $ pytest
============================= test session starts ==============================
platform darwin -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/maxime/Documents/github/taxpasta, configfile: tox.ini, testpaths: tests
collected 26 items

tests/integration/test_kraken2_etl.py ....FF                             [ 23%]
tests/integration/test_metaphlan.py ..                                   [ 30%]
tests/unit/infrastructure/application/test_kraken2_profile.py ..FFF.FF.. [ 69%]
F..F                                                                     [ 84%]
tests/unit/infrastructure/application/test_kraken2_profile_reader.py ... [ 96%]
.                                                                        [100%]
[...]
=========================== short test summary info ============================
FAILED tests/integration/test_kraken2_etl.py::test_kraken2_etl[2611_se-ERR5766174-db1-invalid.kraken2.report.txt]
FAILED tests/integration/test_kraken2_etl.py::test_kraken2_etl[2612_pe-ERR5766176-db1-invalid.kraken2.report.txt]
FAILED tests/unit/infrastructure/application/test_kraken2_profile.py::test_column_presence[columns2]
FAILED tests/unit/infrastructure/application/test_kraken2_profile.py::test_column_presence[columns3]
FAILED tests/unit/infrastructure/application/test_kraken2_profile.py::test_column_presence[columns4]
FAILED tests/unit/infrastructure/application/test_kraken2_profile.py::test_percent[table1]
FAILED tests/unit/infrastructure/application/test_kraken2_profile.py::test_percent[table2]
FAILED tests/unit/infrastructure/application/test_kraken2_profile.py::test_clade_assigned_reads[table2]
FAILED tests/unit/infrastructure/application/test_kraken2_profile.py::test_direct_assigned_reads[table2]
========================= 9 failed, 17 passed in 1.57s =========================
Midnighter commented 2 years ago

Flake8 is failing (try it locally with tox -e flake8)

   src/taxpasta/infrastructure/application/metaphlan_profile.py:19:1: I003 isort expected 1 blank line in imports, found 0
  src/taxpasta/infrastructure/application/metaphlan_profile.py:23:1: I001 isort found an import in the wrong position
  src/taxpasta/infrastructure/application/metaphlan_profile.py:24:1: I003 isort expected 1 blank line in imports, found 0
  src/taxpasta/infrastructure/application/metaphlan_profile.py:60:1: D106 Missing docstring in public nested class
  src/taxpasta/infrastructure/application/metaphlan_profile_reader.py:19:1: F401 'ctypes.wintypes.LARGE_INTEGER' imported but unused
  src/taxpasta/infrastructure/application/metaphlan_profile_reader.py:20:1: F401 'pathlib.Path' imported but unused
  src/taxpasta/infrastructure/application/metaphlan_profile_reader.py:27:1: I001 isort found an import in the wrong position
  src/taxpasta/infrastructure/application/metaphlan_profile_reader.py:28:1: I005 isort found an unexpected missing import
  src/taxpasta/infrastructure/application/metaphlan_profile_reader.py:33:5: F811 redefinition of unused 'LARGE_INTEGER' from line 19
  tests/integration/test_metaphlan.py:22:1: F401 'pandera.errors.SchemaErrors' imported but unused
maxibor commented 2 years ago

Hey @Midnighter, Sorry for the lack of upadtes, I've been really busy the last two weeks running analysis and preparing results for the upcoming conferences. I've added invalid test data, and remove the occurences of Kraken2 in the docstrings

maxibor commented 2 years ago

I also removed the vscode settings

maxibor commented 2 years ago

Hmm, Now the tests from centrifuge are failing ❓

sofstam commented 2 years ago

Is it because you do not have the dev merged into your PR? 🤔

maxibor commented 2 years ago

Is it because you do not have the dev merged into your PR? 🤔

It came from the merge conflict that got rid of the pytest fixture for centrifuge. I've readded it, and it's now 👍

codecov-commenter commented 2 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (dev@484978c). Click here to learn what that means. Patch has no changes to coverable lines.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #21 +/- ## ====================================== Coverage ? 89.29% ====================================== Files ? 22 Lines ? 271 Branches ? 8 ====================================== Hits ? 242 Misses ? 24 Partials ? 5 ``` Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

maxibor commented 2 years ago

That was a painful rebase 🥵 Luckily, after getting lost in all the merge conflicts twice, this did the trick:

$ git checkout my-feature-branch
$ git fetch
$ git merge origin/dev              # 1
$ git reset --soft origin/dev       # 2
$ git commit -va                       # 3
$ git push -f origin my-feature-branch # 4