Closed LienReyserhove closed 5 years ago
Not really an answer, but:
license
: I would add the dataset license to checklist.csv
(just like we added the citation), so we at least know if there are differences. Then we can decide what to do.rightsHolder
: I would add the publisher name to checklist.csv
, so we at least know who we could include.institutionCode
: INBO
is fine.OK, so:
As a license
, we decided to use the least open license of the data we aggregated. This will most probably be a CC-BY license, information to be extracted from the gbif api as suggested.
rightsHolder
--> if we add the publisher name to checklist.csv
as contained in the gbif api, we will still miss some information. For instance, the publisher of the RINSE registry checklist is INBO, while I think it is also correct to use University of Cambridge
as a rightsHolder
. This content is specified in the record-level term rightsHolder
in the taxon Core of the checklist. So, is there a way to extract all rightsHolder
information contained in the taxon cores of all checklists? @damianooldoni @peterdesmet ?
OK
Concerning 2: I see that this information can be found here --> is this an option?
It is an option, but it will slow down everything I think. I see it as the last option possible. But maybe I am too pessimist?
For taxon in your example, the code looks like this:
taxa_verbatim <- name_usage(key = 141264581, return = "data", data = "verbatim")
rights_holder <- taxa_verbatim %>% select(ends_with("rightsHolder"))
Easy to extend to many via map_df
, I think. Just ping me if needed.
Still, querying verbatim
endpoint taxon by taxon will take a lot of time.
Thanks! @peterdesmet what do you think?
With respect to license
, what are the possible licenses we could encounter in a checlist?
I would think:
Others are potentially:
Should we take 3-5 into account here? And is the order (1 - 5) here also the order of the degree of limitation? Or are 3 - 5 more or less equal in that respect, just with another focus...
After discussion:
Decision with @qgroom :
license
:For the license we (unfortunately have to) choose the most restrictive license of the source checklists:
checklists %>% group_by(license) %>% count()
...
is the most restrictive license:taxon %<>% mutate(...)
@LienReyserhove note, in this step also include the GBIF Backbone taxonomy, but exclude it to query taxa in this step.
rightsHolder
:We do not set a
rightsHolder
as the taxon and its related information is based on different source checklists (which in turn are based on other sources), published by different organizations, and mostly released under CC0. Rather, we make effort to credit the source inreferences
(for taxa) andsource
(in the extensions).taxon %<>% mutate(rightsHolder = NA)
institutionCode
: this field is not officially part of the taxon core (https://github.com/gbif/rs.gbif.org/issues/20), but according to our guidelines, this should be the publisher, which will be ISSG
.Decisions:
license
rightsHolder
institutionCode
For some of the record-level terms, I'm not 100% sure what information to use:
license
: Taxa info comes from backbone (CC-BY), rest from datasets, might not always be CC0. I would use the most limiting license, i.e. CC-BYRightsHolder
:This is a difficult one, I'm tempted to say that the owners of the checklists are the rightsHolders, but then we're ignoring the taxonomic information from the backbone...
institutionCode
. Is now populated with "INBO", but according to our guidlines, this should be the same as rightsHolder