trias-project / unified-checklist

🇧🇪 Global Register of Introduced and Invasive Species - Belgium
https://trias-project.github.io/unified-checklist/
MIT License
0 stars 1 forks source link

Include distribution regions #44

Closed damianooldoni closed 4 years ago

damianooldoni commented 5 years ago

This PR solves #43 and includes regional distributions in unified checklist.

Most of the changes are in 5_unify_information.Rmd. Minor changes in 6_dwc_mapping.Rmd.

Here below the new workflow, in bold the steps added, in italic the steps modified:

  1. Parse temporal (eventDate) information.
  2. Filter distributions: this was already done in \@ref(filter-on-distribution)
  3. Map locality and locationId to regional or national level. (see table in #43)
  4. Add a Belgian distribution from regional distributions within a checklist if not present.
  5. Choose a single distribution within a checklist for each location. Partly changed by adding locality and locationId to group_by())
  6. Choose a single distribution across checklists. Partly changed by adding locality and locationId to group_by())
  7. Save to CSV.

In DWC mapping, only minor changes applied:

  1. distribution %<>% mutate(dwc_locationID = locationId) instead of distribution %<>% mutate(dwc_locationID = "ISO_3166-2:BE)
  2. distribution %<>% mutate(dwc_locality = locality) instead of distribution %<>% mutate(dwc_locality = "Belgium")

To avoid massive amount of warnings while transforming Inf/-Inf to integer, I split the mutate call to calculate startYear and endYear within checklists in two steps. The change has no influence on results, but it improve code and speed as no warnings have to be returned.

I applied to the two Rmd files the commando styler::style_file() as last commit. I advice to use it on all other mapping steps as well.