qiime2 / q2-metadata

BSD 3-Clause "New" or "Revised" License
3 stars 17 forks source link

IMP: expand support for metadata merging #60

Closed gregcaporaso closed 1 year ago

gregcaporaso commented 1 year ago

This pull request is intended to be a starting point for more extensive metadata merging support in QIIME 2. This PR depends on https://github.com/qiime2/q2-types/pull/297.

Add support for merging metadata that contains overlapping ids, overlapping columns, neither overlapping ids nor overlapping columns, but not both overlapping ids an overlapping columns (because then there could be conflicts in column values for specific samples, which gets considerably more complex to handle). The result will be the union (i.e., outer join) of the ids and columns from the two metadata inputs.

This action currently only works on two metadata inputs at a time as providing an arbitrary number of metadata files at once would trigger the built-in metadata merging functionality (which would intersect the ids and fail on overlapping column names).

Example usage: md1.tsv:

sample-id   barcode-sequence    body-site
#q2:types   categorical categorical
L2S155  ACGATGCGACCA    left palm
L2S175  AGCTATCCACGA    left palm
L2S204  ATGCAGCTCAGT    left palm
L2S222  CACGTGACATGT    left palm
L3S242  ACAGTTGCGCGA    right palm
L3S294  CACGACAGGCTA    right palm

md2.tsv:

sample-id   body-site   year
#q2:types   categorical numeric
L1S8    gut 2008
L1S57   gut 2009
L1S76   gut 2009
L1S105  gut 2009
$ qiime metadata merge --m-metadata1-file md1.tsv --m-metadata2-file md2.tsv --o-merged-metadata md.qza
$ qiime tools export --input-path md.qza --output-path md/
$ cat md/metadata.tsv
sample-id   barcode-sequence    body-site   year
#q2:types   categorical categorical numeric
L1S105      gut 2009
L1S57       gut 2009
L1S76       gut 2009
L1S8        gut 2008
L2S155  ACGATGCGACCA    left palm
L2S175  AGCTATCCACGA    left palm
L2S204  ATGCAGCTCAGT    left palm
L2S222  CACGTGACATGT    left palm
L3S242  ACAGTTGCGCGA    right palm
L3S294  CACGACAGGCTA    right palm
qiime metadata tabulate --m-input-file md.qza --o-visualization md.qzv

Screenshot 2023-07-13 at 7 07 40 AM

Fixes #11

gregcaporaso commented 1 year ago

Thanks @lizgehret!

lizgehret commented 1 year ago

For posterity, this also resolves: https://github.com/qiime2/qiime2/issues/633