trias-project / ad-hoc-checklist

🍀 Ad hoc checklist of alien species in Belgium
https://trias-project.github.io/ad-hoc-checklist
MIT License
1 stars 2 forks source link

First mapping #17

Closed LienReyserhove closed 6 years ago

LienReyserhove commented 6 years ago

This is a first attempt to map the adhoc checklist data. @peterdesmet now ready for review! This is an overview of all mapped DwC terms, remaining questions and issues

Record level terms

References: See #12 For now, I used the identifiers, not the full bibliographic citation. For the full bibliographic citations, I used the literature references extension. However, many identifiers are lacking, so the mapping is still unfinished and has to be revised when all information is available.

Taxon core

Literature reference extension:

For now, this extension only contains the full bibliographic citations. I did not include the identifiers yet. Whether or not the identifiers can/will be integrated depends on the raw data. See #12

Distribution extension:

The fields locationID and locality are based on the information in location in the raw GS file. Not all fields were populated. I assumed that in these cases, locality = Belgium

Species profile extension

Description extension

Native range

Mapped to WGSRPD standard:

Mapped to or matches with UN geoscheme:

Doesn't match any of the vocabularies above:

Pathway of introduction

Data were already mapped to the CBD standard

Invasion stage

see #16

peterdesmet commented 6 years ago

Questions

Data issues:

LienReyserhove commented 6 years ago

To reply:

I would remove references

I would not. I think we really need this as we don't have a good, short identifier for each species (e.g. when we don't have a doi available). This is then the most complete and best information we have.

Is the script able to parse | without pipes? I noticed one instance where the space was missing (now corrected in source data)

Do you mean "without spaces"? If so, yes, it works even when a space is missing. You can try it with this script (where you can add or remove spaces between test1 and test2

data_frame <- as.data.frame(matrix(c("test1|test2", "test3|test4" ),
                dimnames = list(1:2, "test")))

separate(
  data = data_frame,
  col = "test",
  into = c("column_A", "column_B"),
  sep = "\\|")

What happens if two records of the same species (because of different distribution) differ in e.g. order? Are two taxa created? If so, I would double check if there are no duplicate taxonIDs

In the case of one single species with two distribution records, the code will create only one taxonID for that species and the taxon core will contain only one record for that taxon.

Should I see any info regarding the UN geoscheme in the locality?

We could do that, but I'm not 100% fond of this, as the mapping will then be a combination between several standards or no standard at all. I would keep it limited to the WGSRPD vocabulary

peterdesmet commented 6 years ago

As discussed: agree on all, I would just drop the field references, but keep the extension references.

LienReyserhove commented 6 years ago

Agree for the references!

LienReyserhove commented 6 years ago

@peterdesmet ready for review! You will need to run the script again. Due to the changes in the taxon core (removal of references), I was not able to exclude changes in UTF-8 encoding in the processed files.

peterdesmet commented 6 years ago

Nice! Merge away 🚢