identifier field - Githubissues

peterdesmet commented 6 years ago

I noticed there is an (undocumented) identifier field added to the spreadsheet, which seems to be the URL the source might contain, e.g. a link to a pdf or a DOI. The field is used for the source of the distribution.

I think it would be better to drop the field and only keep source. That way only one field has to be maintained, given that you try to always write URLs (including for DOIs) in full and at the end. @qgroom @timadriaens what do you think?

For the source of the distribution, we can then try to extract the URL using regex in the script.

qgroom commented 6 years ago

Could we use the term dcterms:references for these items? The definition is "A related resource that is referenced, cited, or otherwise pointed to by the described resource."

peterdesmet commented 6 years ago

There is no need for that + it's a bit a misuse of that field:references is supposed to be that taxon, but described in more detail. A good example for occurrences would be the URL of an iNaturalist observation for an occurrence record of that observation.

To better add references, we used the richer http://rs.gbif.org/extension/gbif/1.0/references.xml extension. There we add the full reference as written in "source". If that field is not intended to be a reference for the taxon, we'll stop doing that.
For distributions we use a source. There we opted to only include a URL (not the full reference), concatenated with |. But we could opt to add the full reference if you want.

See for example how we did it for RINSE pathways: https://github.com/trias-project/rinse-pathways-checklist/tree/master/data/processed

The question here is, if we continue to do 2 (add URL only), that it would probably be easier if the URL was extracted from the full reference in the spreadsheet using a script, rather than maintain a full reference (source) and URL file (identifier). Let me know how you see this.

qgroom commented 6 years ago

I'm not clear on all the repercussions of the options. For the sake of consistency I prefer the solution in the RINSE dataset, but I'm not really following you on extracting from the full reference. Couldn't the full reference be be to the observation and not the description of the distribution?

timadriaens commented 6 years ago

there are indeed more than a few records for which the source is effectively a URL to an observation

peterdesmet commented 6 years ago

I have removed the field identifier as almost everywhere the URL was already included (often as part of a full reference) in the field source. Where that was not the case (often because DOI was not written as a URL), I have updated the information in source.

@timadriaens @qgroom please write any URL in the field source (including DOIs) with http:// (or https:// for DOIs)!

I'll update the script to handle this info.

trias-project / ad-hoc-checklist

identifier field #25