Closed maxibor closed 2 years ago
Great idea with the groupby. Keep in mind, though, that the rank names will depend on the taxonomy of the database used. So I would not rely on these particular names. Maybe you can exclude "unclassified" and "root" (I think those are fairly universal) and then group on all other entries present in the column?
Great idea with the groupby. Keep in mind, though, that the rank names will depend on the taxonomy of the database used. So I would not rely on these particular names. Maybe you can exclude "unclassified" and "root" (I think those are fairly universal) and then group on all other entries present in the column?
These are actually harcoded in Metaphlan, see: https://github.com/biobakery/MetaPhlAn/blob/a857bccdac6bd2b26997e25bb9692d79736c1ed8/metaphlan/utils/add_metadata_tree.py#L71-75
EDIT: Ignore this, just saw you setup the test with Tox ;) @Midnighter I've added an integration test, but I'm not sure I understand why the tests are failing on the GH CI :(
They pass fine (for metaphlan) on my machine:
(taxpasta_dev) $ pytest
============================= test session starts ==============================
platform darwin -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/maxime/Documents/github/taxpasta, configfile: tox.ini, testpaths: tests
collected 26 items
tests/integration/test_kraken2_etl.py ....FF [ 23%]
tests/integration/test_metaphlan.py .. [ 30%]
tests/unit/infrastructure/application/test_kraken2_profile.py ..FFF.FF.. [ 69%]
F..F [ 84%]
tests/unit/infrastructure/application/test_kraken2_profile_reader.py ... [ 96%]
. [100%]
[...]
=========================== short test summary info ============================
FAILED tests/integration/test_kraken2_etl.py::test_kraken2_etl[2611_se-ERR5766174-db1-invalid.kraken2.report.txt]
FAILED tests/integration/test_kraken2_etl.py::test_kraken2_etl[2612_pe-ERR5766176-db1-invalid.kraken2.report.txt]
FAILED tests/unit/infrastructure/application/test_kraken2_profile.py::test_column_presence[columns2]
FAILED tests/unit/infrastructure/application/test_kraken2_profile.py::test_column_presence[columns3]
FAILED tests/unit/infrastructure/application/test_kraken2_profile.py::test_column_presence[columns4]
FAILED tests/unit/infrastructure/application/test_kraken2_profile.py::test_percent[table1]
FAILED tests/unit/infrastructure/application/test_kraken2_profile.py::test_percent[table2]
FAILED tests/unit/infrastructure/application/test_kraken2_profile.py::test_clade_assigned_reads[table2]
FAILED tests/unit/infrastructure/application/test_kraken2_profile.py::test_direct_assigned_reads[table2]
========================= 9 failed, 17 passed in 1.57s =========================
Flake8 is failing (try it locally with tox -e flake8
)
src/taxpasta/infrastructure/application/metaphlan_profile.py:19:1: I003 isort expected 1 blank line in imports, found 0
src/taxpasta/infrastructure/application/metaphlan_profile.py:23:1: I001 isort found an import in the wrong position
src/taxpasta/infrastructure/application/metaphlan_profile.py:24:1: I003 isort expected 1 blank line in imports, found 0
src/taxpasta/infrastructure/application/metaphlan_profile.py:60:1: D106 Missing docstring in public nested class
src/taxpasta/infrastructure/application/metaphlan_profile_reader.py:19:1: F401 'ctypes.wintypes.LARGE_INTEGER' imported but unused
src/taxpasta/infrastructure/application/metaphlan_profile_reader.py:20:1: F401 'pathlib.Path' imported but unused
src/taxpasta/infrastructure/application/metaphlan_profile_reader.py:27:1: I001 isort found an import in the wrong position
src/taxpasta/infrastructure/application/metaphlan_profile_reader.py:28:1: I005 isort found an unexpected missing import
src/taxpasta/infrastructure/application/metaphlan_profile_reader.py:33:5: F811 redefinition of unused 'LARGE_INTEGER' from line 19
tests/integration/test_metaphlan.py:22:1: F401 'pandera.errors.SchemaErrors' imported but unused
Hey @Midnighter, Sorry for the lack of upadtes, I've been really busy the last two weeks running analysis and preparing results for the upcoming conferences. I've added invalid test data, and remove the occurences of Kraken2 in the docstrings
I also removed the vscode settings
Hmm, Now the tests from centrifuge are failing ❓
Is it because you do not have the dev merged into your PR? 🤔
Is it because you do not have the dev merged into your PR? 🤔
It came from the merge conflict that got rid of the pytest fixture for centrifuge. I've readded it, and it's now 👍
:exclamation: No coverage uploaded for pull request base (
dev@484978c
). Click here to learn what that means. Patch has no changes to coverable lines.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
That was a painful rebase 🥵 Luckily, after getting lost in all the merge conflicts twice, this did the trick:
$ git checkout my-feature-branch
$ git fetch
$ git merge origin/dev # 1
$ git reset --soft origin/dev # 2
$ git commit -va # 3
$ git push -f origin my-feature-branch # 4
Since it's my first attempt at Pandera, I'm not sure I got it 100% right @Midnighter
Metaphlan reports by default relative abundances, which should sum up to 100%, but at each taxonomic rank (hence the groupby).