wallaceEcoMod / wallace

an interactive, reproducible, expandible, instructional, and open-source GUI-based app for ecological niche modeling
https://wallaceecomod.github.io/
GNU General Public License v3.0
132 stars 46 forks source link

Citing GBIF data properly #374

Open jhnwllr opened 2 years ago

jhnwllr commented 2 years ago

Hello I am writing from GBIF.

I am doing a small outreach to those R packages that use GBIF occurrence search.

Under the terms of the GBIF data user agreement, users who download data agree to cite a DOI. Good citation also rewards data-publishing institutions and individuals by reinforcing the value of sharing open data and demonstrating its impact to their funders.

https://docs.ropensci.org/rgbif/articles/gbif_citations.html https://www.gbif.org/citation-guidelines

Unfortunately, when using the occurrence search, rather than the occurrence download, one does not receive a citable DOI.

Because occurrence search is easier for some users to use, we have created something called derived datasets, which allows users to create a citable DOI after they have pulled the data from the GBIF public API.

https://www.gbif.org/derived-dataset

As a package maintainer, it would be appreciated by GBIF, if you could remind users in the documentation or with warning messages to cite the GBIF mediated data properly, perhaps by linking to one of these articles:

https://docs.ropensci.org/rgbif/articles/gbif_citations.html https://www.gbif.org/citation-guidelines https://www.gbif.org/derived-dataset

Also important to remind users to keep the datasetKey column because this allows for proper attribution to the original data providers.

gepinillab commented 2 years ago

Dear John (@jhnwllr),

The Wallace team is already aware that the recent update of the rgbif package makes it easier to obtain a citable DOI. It is fantastic the implementation of the derived datasets, and we will consider their implementation in our package in the future.

In the last few years, we worked on the second version of our package, which we will soon submit the manuscript to review. One of its new features is obtaining a citable DOI using the occCite package (thanks to the collaboration with @hannahlowens). So, users will have an option #ToCiteDOI of GBIF data when they want to download all the occurrences.

Thanks for writing us about this critical topic. We will be looking to improve DOI citations for occurrence searches in future releases.

Regards, Gonzalo Pinilla

dnoesgaard commented 2 years ago

Hi Gonzalo,

I've been playing around with the v1.9 beta of Wallace, including the implementation of occCite for getting occurrences. I think this is a significant improvement, so thanks for that!

That being said, considering that citing GBIF using a DOI is a requirement of the terms of the GBIF data user agreement, I would love to see this implementation as the default behaviour in Wallace rather than optional.

Another solution (also mentioned by John) could be to retain the datasetKey column in the data pulled using spocc, allowing the user to create a derived dataset record for citing only the specific records used in the downstream analysis.

Thanks, Daniel

gepinillab commented 2 years ago

Hi Daniel,

Thanks for checking the current implementation of occCite in Wallace. I am glad that you like it. Unfortunately, making this option as default is not possible because of i) the time that could take to download some species with thousands of records and ii) the possibility that used machines could not handle "massive" occurrences databases (RAM issue of R handling data).

I think that the way to go is with derived datasets. First, I believe it is mediately easy to create an option in Wallace to download a CSV file with datasetKey and occurrence counts that are ready to upload to gbif.org/derived-dataset. Also, we can generate a template for the description field required in this website, mentioning how the data was obtained and processed in Wallace. This will make more accessible the registration of these datasets for the users.

I will share this issue with the rest of our development team to get a potential timeline for its implementation in Wallace. We will keep you posted.

Best, Gonzalo