qiime2 / q2-diversity-lib

BSD 3-Clause "New" or "Revised" License
0 stars 11 forks source link

BUG: After 2022.8 faith_pd_vector.qza comes with a #SampleID index/column name #57

Closed mestaki closed 3 months ago

mestaki commented 1 year ago

After 2022.8 I noticed that the table output of core-metrics' faith_pd_vector.qza has a #SampleID as its first column name, while the other vectors continue to have a blank column/index name. Not an issue within the Q2 environment but it does break some external pipelines, for example if you are importing into R expecting them all to be similar in structure.

ebolyen commented 1 year ago

Thanks @mestaki, that's super strange, we'll need to reproduce this on our end, as it doesn't really make sense that this would happen.

ebolyen commented 1 year ago

Oh, it may be the unifrac package adding this actually.

mestaki commented 1 year ago

I think you may be right, I vaguely remember arriving at the same conclusion, I think default unifrac changed at that release to the faster one?

lizgehret commented 1 year ago

Hi @mestaki! Thanks for reporting this - I was able to reproduce on my end using 2023.7. This will be a 'good first issue' candidate for a new RSE1 we have joining the team in a few weeks, so I'm going to add some additional details below for their troubleshooting.

Steps to reproduce this issue:

  1. Install/activate a QIIME 2 environment that is >2022.8 (I used 2023.7)
  2. Grab sample metadata and feature table/rooted tree artifacts from the Moving Pictures tutorial.
  3. Run qiime diversity-lib faith-pd using the feature table and rooted tree artifacts.
  4. Unzip the resultant artifact, and open the alpha-diversity.tsv file within the data directory.

The #SampleID value will be present as the column name above the sample IDs in this table.

Another way to reproduce this issue would be to run qiime diversity core-metrics-phylogenetic using the metadata and artifacts listed above, and then unzipping the faith-pd artifact that is one of the outputs from running this pipeline. The results will be the same, since this is using the same underlying method.

The #SampleID value should be removed, so that there is no column header for the sample IDs.

cherman2 commented 6 months ago

Adding this back to Triage so It doesn't get lost because a user noticed on the forum today. Forum ref

colinbrislawn commented 6 months ago

I just ran into this too image

This is perfect for a Good First Issue if we are doing those

lizgehret commented 6 months ago

Turns out this one is a bit trickier than we initially anticipated - notes from an old pairing session can be found on this branch for reference. We will address this when @cherman2 and I are back from Europe!

cherman2 commented 4 months ago

We will address this issue in our next release, so I am moving this to our 2024.10 release project board.

lizgehret commented 3 months ago

This has been fixed, thanks @mestaki for reporting! If you need the updated functionality prior to our next release (2024.10) you can install a development environment once this weekend's new environment files are built.