trias-project / rinse-pathways-checklist

🚢 RINSE - Pathways and vectors of biological invasions in Northwest Europe
https://trias-project.github.io/rinse-pathways-checklist
MIT License
0 stars 0 forks source link

First mapping #6

Closed LienReyserhove closed 6 years ago

LienReyserhove commented 6 years ago

This is a first attempt to map the data. Now ok for review. A summary of the Darwin Core terms and other remarks below.

Pre-processing

  1. The scientific names were cleaned using the namerparser() function provided by rgbif
  2. Generate taxonID's

Taxon core

Information on classification (kingdom, phylum, class, order) was interpreted by myself. Might be doublechecked

Literature Reference extension

  1. A new raw file reference.xlsx was generated, with reference information contained in the supplementary information.
  2. Reference number infomration in the raw data file was cleaned using the function colon_to_seq (thanks to @stijnvanhoey )

Distribution extension

  1. Date and country information was rearranged (from 8 to 2 columns)
  2. For eventDate: 2016 was taken as the end date (year of the publication)

Species Profile extension

Description extension

  1. Native range: information was mapped (when possible) to the WGSRPD standard
  2. Pathway information: complicated! See also #3
peterdesmet commented 6 years ago

What is the DOI of the paper and link to the source file? Would include this in the README (for sure) and maybe also in the Rmd. Same for the references file: what is the link on which this file was based?

peterdesmet commented 6 years ago

I'm starting my review here, will let you know when it's finished:

  1. I would include (and document in the Rmd) the following two mappings:
  2. With the mapping above, I would drop higherClassification
  3. taxonRank: the 3 hybrids are species, but let's leave those indicated as such for now, until we have an answer on https://github.com/gbif/portal-feedback/issues/1354
LienReyserhove commented 6 years ago

With respect to this comment:

I agree that we should include the link to the source file in the README (part rationale), in a similar way as we did this for the other checklist. I will explicitly integrate the fact that we found the file in supplementary information. We should also do this for alien macroinvertebrates. In the README Workflow I will refer to the raw data file as available on GitHub.

Similarly, the doi and the link to the supplementary file can be given in the metadata (we already do this for the macroinvertebrates as well.)

It's an option to mention the DOI and links in the .Rmd as well. However, I don't think it's necessary as this is why we write the metadata + we should keep the same approach for all checklists.

LienReyserhove commented 6 years ago

More concretely, I would approach it like this:

  1. README:

    • Rationale: link to the original file (here: supplementary word document table 2).
    • Workflow: keep approach as it is, i.e. refer to the source file found on GitHub. Maybe specify that is was transcribed)
  2. Metadata:

    • Refer to the DOI of the paper and if necessary, where to find the original data (in this case supplementary material)
    • Describe the whole process in detail in the step description section of the metadata (here: download supplementary material, transcribed to an tab delimited file, perform mapping, etc.)
  3. .Rmd

LienReyserhove commented 6 years ago

I also included a suggestion for the README to this PR (not all links are working yet as they refer to the master)

peterdesmet commented 6 years ago

Better documentation of pathway/vector decision can be tackled after this PR

LienReyserhove commented 6 years ago

Thanks for the review. I did the following:

Ready for re-review :smiley_cat:

peterdesmet commented 6 years ago

Nice work! Ok to merge