ropensci / EML

Ecological Metadata Language interface for R: synthesis and integration of heterogenous data
https://docs.ropensci.org/EML
Other
97 stars 33 forks source link

Species name epithet is not handled the way specified in the EML schema #328

Closed laijasmine closed 3 years ago

laijasmine commented 3 years ago

From the EML schema for taxonRankValue :

Note that for the taxonomic rank "species", the accepted practice is to use binomial nomenclature, i.e., a combination of the genus name plus species epithet is required to denote the species. Therefore the "species" is not the species epithet alone.

The example given:

Acer rubrum for a species rank value

However the EML package splits the species epithet by the space into genus and species: https://github.com/ropensci/EML/blob/54052b900e7a2f8950d4c9ef737cd1956ce5c44e/R/set_coverage.R#L170-L185

A PR to come shortly! To make sure species epithet is not split.

cboettig commented 3 years ago

@laijasmine Thanks, good point. I think the better behavior here when given a list of scientific names would be to not split, like you say, but also omit the nested structure, which is just noise. it should be:

if (is.character(sci_names) && !expand) {
  taxa <- lapply(strsplit(sci_names, " "), function(s) {
    list(
      taxonRankName = "Species",
      taxonRankValue = s
    )

anyway. Ideally this would also support giving <Genus> <specificEpithet> <intraSpecificEpithet> as the "Species" name as well.

I think when giving the full classification, it would make sense to follow Darwin Core terms for rank levels, and list the specific epithet as specificEpithet (which should indeed not include the Genus name), and avoid the use of a rank of species (which doesn't exist in darwin core).

(Really I think the choice to define hierarchical ranks as nested instead of rank-value pairs was less than ideal anyway, but oh well!)

Thanks much for a PR!

laijasmine commented 3 years ago

resolved in PR #329