qiime2 / q2-metadata

BSD 3-Clause "New" or "Revised" License
3 stars 17 forks source link

tabulate fails on duplicate column names #26

Open gregcaporaso opened 6 years ago

gregcaporaso commented 6 years ago

Comments This would be nice to fix because sometimes we may want to merge closely related data. For example, it would be useful to be able to use qiime metadata tabulate to compare taxonomy assignments generated against Silva and Greengenes by passing two FeatureData[Taxonomy] files.

thermokarst commented 6 years ago

This is a framework-level problem --- duplicate column names aren't allowed in Metadata.

gregcaporaso commented 6 years ago

I'm thinking tabulate might be able to work around that if it identifies an issue, maybe by renaming the columns (which could be an option that is disabled by default).

mentorwan commented 6 years ago

This is Yunhu at NCI. Here is one example from one run. I have 26 runs for this project. qiime metadata tabulate --m-input-file 170811_M01354_0078_000000000-B8G2C-dada2-stats.qza --o-visualization test.qzv

It will be nice I can input multiple files together to generate a final table. But the error is duplicate column name. I can export each one into table and merge them. Is it possible that you can merge all qza files just like SV table and sequence tables. Thanks!

thermokarst commented 6 years ago

Thanks for the use-case example, @mentorwan!

@gregcaporaso - your idea makes sense, but this problem impacts all kinds of other methods (e.g. a few q2-longitudinal actions accept alpha diversity as metadata - it is impossible to compare multiple alpha div metrics at once because of this same problem). Seems like it would be easier (and more predictable for users) if we fixed this in the framework.

gregcaporaso commented 6 years ago

but this problem impacts all kinds of other methods

Makes sense, wasn't thinking about this. So should I open this issue on the framework instead (or maybe we even have one for it already)?

MADscientist314 commented 4 years ago

I am having a similar issure trying to merge the stats from 9 different dada2 runs on a gridded environment.

qiime metadata tabulate --m-input-file $stats --o-visualization ./test/visualization/merged-stats-dada2.qzv There was an issue with merging QIIME 2 Metadata: Cannot merge metadata with overlapping columns. The following columns overlap: 'input', 'filtered', 'percentage of input passed filter', 'denoised', 'non-chimeric', 'percentage of input non-chimeric'