taxprofiler / taxpasta

TAXnomic Profile Aggregation and STAndardisation
https://taxpasta.readthedocs.io/
Apache License 2.0
34 stars 7 forks source link

[BUG] CRITICAL error when parsing MetaPhlAn4's output #111

Closed apcamargo closed 1 year ago

apcamargo commented 1 year ago

Is there an existing issue for this?

Problem description

I'm getting the following error message when trying to parse MetaPhlAn4's outputs with taxpasta.

[21:40:18] CRITICAL Error in sample 'ERR7569997' with profile 'metaphlan/ERR7569997.txt'.          merge.py:422
           CRITICAL     schema_context  ... index                                                  merge.py:425
                    4  DataFrameSchema  ...  None
                    0           Column  ...     0
                    1           Column  ...  None
                    2           Column  ...  None
                    3           Column  ...  None

                    [5 rows x 6 columns]

I wonder if this has anything to do with the non-standard taxa in MetaPhlAn4's output (see example below).

k__Bacteria|p__Atribacterota|c__CFGB8897|o__OFGB8897|f__FGB8897 2|67818|||      0.01864

Full MetaPhlAn4 output available here.

Code sample

taxpasta merge -o taxpasta/taxpasta.tsv --taxonomy /home/geoadmin/geonomecontainer/KMCP/taxdump --profiler metaphlan --add-lineage metaphlan/*.txt

Environment

### Package Information | Package | Version | |:---------|--------:| | taxpasta | 0.3.0 | ### Dependency Information | Package | Version | |:-----------------------------|------------:| | bash-kernel | **missing** | | biom-format | 2.1.15 | | depinfo~ | **missing** | | jupyter | **missing** | | mkdocs-awesome-pages-plugin~ | **missing** | | mkdocs-exclude~ | **missing** | | mkdocs-material~ | **missing** | | mkdocstrings[python]~ | **missing** | | numpy~ | **missing** | | odfpy | **missing** | | openpyxl | **missing** | | pandas~ | **missing** | | pandera~ | **missing** | | pre-commit | **missing** | | pyarrow | 11.0.0 | | rich | 13.4.2 | | tabulate~ | **missing** | | taxopy~ | **missing** | | tox~ | **missing** | | typer~ | **missing** | ### Build Tools Information | Package | Version | |:-----------|--------:| | pip | 23.0.1 | | setuptools | 67.4.0 | | wheel | 0.38.4 | ### Platform Information | | | |:--------|-------------------------:| | Linux | 5.15.0-1037-azure-x86_64 | | CPython | 3.11.0 |

Anything else?

No response

jfy133 commented 1 year ago

Thanks @apcamargo ! Fortunately mp4 support is [already coming] (https://github.com/taxprofiler/taxpasta/pull/107)!

It's very helpful you've shared an example output. Would you happen to have a couple more example output files? It's something that was requested on the PR adding it to the tool above :)

apcamargo commented 1 year ago

Sure! ERR7569997.txt ERR7569998.txt