trias-project / unified-checklist

🇧🇪 Global Register of Introduced and Invasive Species - Belgium
https://trias-project.github.io/unified-checklist/
MIT License
0 stars 1 forks source link

Input requested for some record-level terms #31

Closed LienReyserhove closed 5 years ago

LienReyserhove commented 5 years ago

For some of the record-level terms, I'm not 100% sure what information to use:

  1. license: Taxa info comes from backbone (CC-BY), rest from datasets, might not always be CC0. I would use the most limiting license, i.e. CC-BY

  2. RightsHolder:

    Organization who has the rights to the data and in the case of multiple rights holder, the organization who managed/made the decision to release those rights under CC0. Is often the same as publishing organization.

This is a difficult one, I'm tempted to say that the owners of the checklists are the rightsHolders, but then we're ignoring the taxonomic information from the backbone...

  1. institutionCode. Is now populated with "INBO", but according to our guidlines, this should be the same as rightsHolder
peterdesmet commented 5 years ago

Not really an answer, but:

  1. license: I would add the dataset license to checklist.csv (just like we added the citation), so we at least know if there are differences. Then we can decide what to do.
  2. rightsHolder: I would add the publisher name to checklist.csv, so we at least know who we could include.
  3. institutionCode: INBO is fine.
LienReyserhove commented 5 years ago

OK, so:

  1. As a license, we decided to use the least open license of the data we aggregated. This will most probably be a CC-BY license, information to be extracted from the gbif api as suggested.

  2. rightsHolder --> if we add the publisher name to checklist.csv as contained in the gbif api, we will still miss some information. For instance, the publisher of the RINSE registry checklist is INBO, while I think it is also correct to use University of Cambridge as a rightsHolder. This content is specified in the record-level term rightsHolder in the taxon Core of the checklist. So, is there a way to extract all rightsHolderinformation contained in the taxon cores of all checklists? @damianooldoni @peterdesmet ?

  3. OK

LienReyserhove commented 5 years ago

Concerning 2: I see that this information can be found here --> is this an option?

damianooldoni commented 5 years ago

It is an option, but it will slow down everything I think. I see it as the last option possible. But maybe I am too pessimist?

damianooldoni commented 5 years ago

For taxon in your example, the code looks like this:

taxa_verbatim <- name_usage(key = 141264581, return = "data", data = "verbatim")
rights_holder <- taxa_verbatim %>% select(ends_with("rightsHolder"))

Easy to extend to many via map_df, I think. Just ping me if needed. Still, querying verbatim endpoint taxon by taxon will take a lot of time.

LienReyserhove commented 5 years ago

Thanks! @peterdesmet what do you think?

LienReyserhove commented 5 years ago

With respect to license, what are the possible licenses we could encounter in a checlist?

I would think:

  1. CC0
  2. CC-BY

Others are potentially:

  1. CC-BY-SA
  2. CC-BY-ND
  3. CC-BY-NC

Should we take 3-5 into account here? And is the order (1 - 5) here also the order of the degree of limitation? Or are 3 - 5 more or less equal in that respect, just with another focus...

LienReyserhove commented 5 years ago

After discussion:

  1. CC0
  2. CC-BY
  3. CC-BY-NC in that order
peterdesmet commented 5 years ago

Decision with @qgroom :

  1. license:

For the license we (unfortunately have to) choose the most restrictive license of the source checklists:

checklists %>%
  group_by(license) %>%
  count()

... is the most restrictive license:

taxon %<>% mutate(...)

@LienReyserhove note, in this step also include the GBIF Backbone taxonomy, but exclude it to query taxa in this step.

  1. rightsHolder:

We do not set a rightsHolder as the taxon and its related information is based on different source checklists (which in turn are based on other sources), published by different organizations, and mostly released under CC0. Rather, we make effort to credit the source in references (for taxa) and source (in the extensions).

taxon %<>% mutate(rightsHolder = NA)
  1. institutionCode: this field is not officially part of the taxon core (https://github.com/gbif/rs.gbif.org/issues/20), but according to our guidelines, this should be the publisher, which will be ISSG.
peterdesmet commented 5 years ago

Decisions:

license

https://github.com/trias-project/unified-checklist/blob/4b7605e0b5bf2c1a8a2f9a4e3e737c6a51818ff0/src/6_dwc_mapping.Rmd#L204

rightsHolder

https://github.com/trias-project/unified-checklist/blob/4b7605e0b5bf2c1a8a2f9a4e3e737c6a51818ff0/src/6_dwc_mapping.Rmd#L248

institutionCode

https://github.com/trias-project/unified-checklist/blob/4b7605e0b5bf2c1a8a2f9a4e3e737c6a51818ff0/src/6_dwc_mapping.Rmd#L278