iobis / env-data

ENV-DATA related issues and documentation
2 stars 0 forks source link

How to store omics #22

Open rubenpp7 opened 4 years ago

rubenpp7 commented 4 years ago

Lately we are being asked how does the OBIS-ENV format deal with omics. I have found the DwC term "associatedSequences" that we could use to link the sequence to the occurrences.

It seems suitable for the task although I would like to know if you have dealt with it before.

albenson-usgs commented 4 years ago

I have used associatedSequences before to link an occurrence that has data in GenBank or other genomic repository. It works well from my perspective. The real sticky part with omics data are the taxonomic identifications and the processing pipeline and how to document all of that in enough detail that downstream users can understand how a taxonomic identification was made. For instance, see this blog post by Rod Page https://iphylo.blogspot.com/2019/12/gbif-metagenomics-and-metacrap.html pointing out some issues with the TARA Oceans matches. If we can make all of this as transparent as possible then downstream users can make better decisions about how to use the data. There's also the issue of what to do when there are no taxonomic identifications- just OTUs. This may be straying a bit from your original question but the issue title is pretty broad :-)

pieterprovoost commented 4 years ago

FYI, there is some work going on here around packaging MIxS metadata into BIOM files and translating those into Darwin Core: https://github.com/GLOMICON/asvBiomXchange.

Besides associatedSequences also take a look at the GGBN Amplification Extension and https://data-blog.gbif.org/post/gbif-molecular-data/. I'll try to come up with OBIS guidelines early next year.

@albenson-usgs We have similar issues with other datasets e.g. this one.