Closed LienReyserhove closed 6 years ago
Answer by @peterdesmet:
Regarding 1: pick 🧠of @stijnvanhoey
Regarding 2: let's say we have 30 unique references
sources.csv
file with those 30 sources.number
: 1, 2, 3 as in dataidentifier
: DOI, or other linkfull_reference
: Pensoft written citationidentifier
with the DOI. If none is available, try to find a pdf link. If that is not available, create a unique code (e.g. smith_2016
)taxonid
, identifier
and bibliographicCitation
. This extension will contain many duplicates.identifier
(DOI) in the distribution extension to populate source
. Multiple sources should be separated with space pipe space |
Some feedback needed:
The Zieritz et al. (2016) checklist has a
reference
column containing numbers. Two things with respect to that:The numbers are separated by comma's and hyphens. The hyphen is used to indicate a sequence, i.e.
1-4
refers to references 1, 2, 3 and 4. We need the latter. I didn't figure out yet how I can generate these sequences in an way that makes the code readable. Thus, I suggest to generate the sequences in the raw data file, rather than performing the cleaning in the R script (which makes it more messy). As this is a dead dataset, I think the cleaning step won't harm.For some species, about 12 reference numbers are provided, which is a lot. Just to be sure, is it really necessary to integrate the full reference? The fields will be full of text, but I guess there's no other way around that right?