ropensci / CoordinateCleaner

Automated flagging of common spatial and temporal errors in biological and palaeontological collection data, for the use in conservation, ecology and palaeontology.
https://docs.ropensci.org/CoordinateCleaner/
79 stars 21 forks source link

new metagenomics (MGnify) filter proposition and ideas #18

Open jhnwllr opened 5 years ago

jhnwllr commented 5 years ago

Background

GBIF has recently begun publishing records from a metagenomics publisher MGnify. https://www.gbif.org/publisher/ab733144-7043-4e88-bd4f-fca7bf858880

Typically these records can be bacteria or other microbes. Often however these records can be trace DNA of some plant, animal, insect or something else.

https://www.gbif.org/occurrence/taxonomy?publishing_org=ab733144-7043-4e88-bd4f-fca7bf858880

Problems

Solutions

jhnwllr commented 5 years ago

New blog post on the gbif data blog outlines some of the problems with this type of data:

https://data-blog.gbif.org/post/gbif-molecular-data-quality/

azizka commented 4 years ago

Thanks for the suggestions. Yes, these genomic data can be problematic. I am not sure if we should add a separate function for this, since the meta-data are probably the best way to address this problem. For instance the "IndividualCount" information provided with GBIF data can be very helpful! Are youa ware of a list of all providers in gbif that provide metagenomics data?

jhnwllr commented 4 years ago

This issue is discussed more here: https://discourse.gbif.org/t/metagenomics-and-metacrap/1583/13

This issue has somewhat been solved on the GBIF-side, but "the problem" will likely continue to get worse.