trias-project / rinse-pathways-checklist

šŸš¢ RINSE - Pathways and vectors of biological invasions in Northwest Europe
https://trias-project.github.io/rinse-pathways-checklist
MIT License
0 stars 0 forks source link

Use gather on pathway only #8

Closed peterdesmet closed 6 years ago

peterdesmet commented 6 years ago

For mapping pathway and vector, it is easier to only gather the pathway in rows (as we will need each one of these) and removing NA, but leaving the vectors as columns.

Code suggestion (from Exploratory):

# 1. Select columns I want (this step is not required)
select(species, pathway_import_release, pathway_import_escape, pathway_accidental, pathway_dispersal, vector_ornamental, vector_leisure, vector_industry, vector_biocontrol, vector_research) %>%

# 2. Transform "Y" values to "T" and make them logicals.
# This step is a bit verbose (and not required), but makes the mapping (step 6) a bit more readable
mutate(
  pathway_import_release = parse_logical(recode(pathway_import_escape, "Y" = "T")),
  pathway_import_escape = parse_logical(recode(pathway_import_escape, "Y" = "T")),
  pathway_accidental = parse_logical(recode(pathway_accidental, "Y" = "T")),
  pathway_dispersal = parse_logical(recode(pathway_dispersal, "Y" = "T")),
  vector_ornamental = parse_logical(recode(vector_ornamental, "Y" = "T")),
  vector_leisure = parse_logical(recode(vector_leisure, "Y" = "T")),
  vector_industry = parse_logical(recode(vector_industry, "Y" = "T")),
  vector_biocontrol = parse_logical(recode(vector_biocontrol, "Y" = "T")),
  vector_research = parse_logical(recode(vector_research, "Y" = "T"))
) %>%

# 3. Gather pathway, remove NA
gather(pathway, value, starts_with("pathway_"), na.rm = TRUE, convert = TRUE) %>%

# 4. Column "value" will only contain "TRUE" (or "Y" if you skip step 2), so no need for this column
select(-value) %>%

# 5. Arrange by species to see things more in context (not required)
arrange(species) %>%

# 6. Mapping itself (7 instead of 11 steps). Maybe include an _else_ at the bottom
mutate(CBD = case_when(
    pathway == "pathway_accidental" ~ "stowaway,contaminant",
    pathway == "pathway_dispersal" ~ "corridor,natural_dispersal",
    pathway == "pathway_import_escape" & vector_leisure ~ "escape_food_bait",
    pathway == "pathway_import_escape" & vector_research ~ "escape_research",
    pathway == "pathway_import_escape" ~ "escape",
    pathway == "pathway_import_release" & vector_biocontrol ~ "release_biocontrol",
    pathway == "pathway_import_release" ~ "release"
)) %>%

# 7. Separating "stowaway,contaminant", ... into two columns
separate(CBD, into = c("CBD_1", "CBD_2"), sep = "\\s*\\,\\s*", remove = TRUE, convert = TRUE) %>%

# 8. Gather 2 columns into 2 rows
gather(key, value, starts_with("cbd_"), na.rm = TRUE, convert = TRUE) %>%

# 9. Sort to show context per species
arrange(species)
peterdesmet commented 6 years ago

Updated approach:

# 1-5. Are identical to above

# 6. Map pathways to cbd level 1 category
mutate(cbd = recode(pathway,
  "pathway_accidental" = "stowaway,contaminant",
  "pathway_dispersal" = "corridor,natural_dispersal",
  "pathway_import_escape" = "escape",
  "import_release" = "release"
)) %>%

# 7. Update some values to cbd level 2 category based on vector
mutate(cbd = case_when(
  cbd == "escape" & vector_research & vector_leisure ~ "escape_research,escape_food_bait",
  cbd == "escape" & vector_research ~ "escape_research",
  cbd == "escape" & vector_leisure ~ "escape_food_bait",
  cbd == "release" & vector_biocontrol ~ "release_biocontrol",
  TRUE ~ cbd # Leave other values as is
)) %>%

# 8. Separate commas into two columns
separate(cbd, into = c("cbd_1", "cbd_2"), sep = "\\s*\\,\\s*", remove = TRUE) %>%

# 9. Gather
gather(key, value, starts_with("cbd_"), na.rm = TRUE, convert = TRUE) %>%

# 10. Arrange
arrange(species) %>%
LienReyserhove commented 6 years ago

Close, but still not 100% correct :smile: E.g. Acipenser baerii Brandt, 1869 has now pathway escape_food_bait, while it is also introduced by vector industry.

I suggest to adapt the approach a bit. I would map the vector information only when (1) exclusively one vector used to introduce the species AND (2) when it maps clearly to the CBD standard. In our case, the code would look like this:

# 1 - 6. Identical
# 7 Update to cbd level 2
mutate(cbd = case_when(
  cbd == "escape" &
    vector_research == "Y" & 
    vector_leisure == "Y" &
    is.na(vector_ornamental) &
    is.na(vector_biocontrol) & 
    is.na(vector_industry) ~ "escape_research, escape_food_bait",
  cbd == "escape" &
    vector_research == "Y" & 
    is.na(vector_leisure ) &
    is.na(vector_ornamental) &
    is.na(vector_biocontrol) & 
    is.na(vector_industry) ~ "escape_research",
  cbd == "escape" &
    is.na(vector_research) & 
    vector_leisure == "Y" &
    is.na(vector_ornamental) &
    is.na(vector_biocontrol) & 
    is.na(vector_industry) ~ "escape_food_bait",
  cbd == "release" &
    is.na(vector_research) & 
    is.na(vector_leisure) &
    is.na(vector_ornamental) &
    vector_biocontrol == "Y" & 
    is.na(vector_industry) ~ "release_biocontrol",
TRUE ~ cbd # Leave rest as is
))

#8-10 Identical