stevenpbachman / shinyRapidLeastConcern

Generate Least Concern Red List assessments and point maps for SIS Connect
3 stars 5 forks source link

Citing GBIF data #96

Open dnoesgaard opened 4 years ago

dnoesgaard commented 4 years ago

Maybe this has already been addressed in a separate issue, but I wondering if there's a way of highlighting or directly incorporating a citation of data pulled from GBIF?

As you may know the GBIF use agreement requires that users specifically acknowledge the dataset publishers whose data they're using. For data obtained using the GBIF occurrence search API this would mean all datasets contributing records to a search used.

As rgbif already is used, so perhaps the gbif_citation() function could be used for this purpose?

barnabywalker commented 4 years ago

Yeah, this is something we've been discussing, and is probably something that we should have had sorted at the start. We're just not sure what the best way to incorporate it is at the moment.

The output of this tool is all the files that you need to upload to the IUCN for a Least Concern assessment. On that front, the GBIF citations would be part of the bibliography for a published assessment, so we're trying to work out what's an acceptable way for us to provide this to the IUCN.

But there are also references for the spatial data provided with an assessment, so I think we can easily add the citations straight in to that. Unless you meant adding the citations as some sort of visual element in the app itself?

Practically, we're using the occ_data function to request points from the GBIF API, which doesn't return the associated metadata to work with the gbif_citation function. Do you know if occ_search is significantly slower? I guess we can just try it and see, and then try to work something else out if it is.

dnoesgaard commented 4 years ago

Glad to hear that you're thinking about this :) Having GBIF citations as part of the bibliography would make sense.

Huh, I wasn't aware that gbif_citation didn't work with occ_data(). I would've expected all the necessary metadata to be part of the occ_data() response too. @sckott can you perhaps enlighten us here?

barnabywalker commented 4 years ago

Did a quick check, and occ_search is significantly slower than occ_data, which I think is because of all the extra metadata that occ_search gets.

BUT I did find out that just because the object returned by occ_data can't be used directly in gbif_citation, doesn't mean we can't get the citation info from the returned data, because the datasetKey is returned as a field. So I think this might have been a problem with my own ignorance.

So we can add in an extra bit to get the citation info for all datasets used fairly easily.

sckott commented 4 years ago

Yes, occ_data is faster than occ_search as the latter deals with parsing all the data GBIF returns, whereas the former throws away all but the occurrence records

Correct that gbif_citation isn't set up to work with occ_data out of the box. Having a look

sckott commented 4 years ago

@barnabywalker occ_data now works with gbif_citation, if you reinstall the dev version remotes::install_github("ropensci/rgbif")

dnoesgaard commented 4 years ago

Thanks @sckott for quick response—this is really appreciated! I imagine that a lot of users will benefit from this.

@barnabywalker, please let me know if you can work with gbif_citation() to include citation information in the bibliography. I think this will be a very helpful improvement to Red List assessments that previously have only had very generic citations of the GBIF website—not the actual providers of data.

dnoesgaard commented 4 years ago

fwiw, I just played around with gbif_citation() on an occ_data call and it works great!

> gbif_results <- occ_data(
+       taxonKey = 9206251,
+       hasGeospatialIssue = FALSE,
+       hasCoordinate = TRUE,
+       limit = 500
+       )
> citation_data <- gbif_citation(gbif_results)
> unlist(lapply(citation_data, "[[", c("citation", "citation")))
[1] "iNaturalist.org (2020). iNaturalist Research-grade Observations. Occurrence dataset https://doi.org/10.15468/ab3s5x accessed via GBIF.org on 2020-01-30.. Accessed from R via rgbif (https://github.com/ropensci/rgbif) on 2020-01-30"                                                                                                                                                     
[2] "Shah M, Coulson S (2020). Artportalen (Swedish Species Observation System). Version 92.176. ArtDatabanken. Occurrence dataset https://doi.org/10.15468/kllkyl accessed via GBIF.org on 2020-01-30.. Accessed from R via rgbif (https://github.com/ropensci/rgbif) on 2020-01-30"
...

The rgbif manual has some nice examples of dealing with citations...

barnabywalker commented 4 years ago

@dnoesgaard yeah we'll get this incorporated into the app as soon as we can, and try and find the best way to incorporate the citations in the assessments. But I don't think this will be a solution for assessments that aren't generated using this app.

dnoesgaard commented 4 years ago

Thanks for the help!

My colleague @andrewrodrigues is working with expert groups to ensure there is focus on citing data here also. So hopefully we'll see less of unspecific GBIF citations.

sckott commented 4 years ago

a new version of rgbif (with this change and others) will be on CRAN in the next few days or early next week