create function for climate matching

timadriaens commented 3 years ago

@damianooldoni within the RIPARIAS project @SanderDevisscher and me developed a procedure for climate matching, the idea is described in this issue and further developed in the RIPARIAS repo for crayfish and aquatic plants.

This issue is meant to make this into a function within the trias package.

timadriaens commented 3 years ago

This workflow, as well as the one for the RIPARIAS climate matching, would be good to consolidate as a package/function, the same way it is done for TrIAS. I think there is great scope for it to be used worldwide by the invasion community, for example for horizon scanning, drafting alert lists, risk assessment etc.

Functionalities:

[ ] add gbif key or list of keys
[ ] set level of minimum coordinate uncertainty (see)
[ ] specify BasisOfRecord
[ ] provide possibility to define tresholds on minimum number of gbif records
[ ] provide possibility to define tresholds on minimum %overlap
[ ] plot map of the world with KG climate classification as underground and the occurrence data plotted on it
[ ] plot global map of climatic suitability for the species based on %overlap
[ ] return dataframe with species, number of occurrences per KG classification category and %overlap (for both current and future climate)

@SanderDevisscher are there any other functionalities you can think of?

SanderDevisscher commented 3 years ago

Maybe it could be useful to be able set the Region you do the matching for (default belgium). can be achieved by (belgium in example):

region <- "belgium"
library(rworldmap)
worldmap <- getMap(resolution = "low")
region_shape <- subset(worldmap, tolower(worldmap$ADMIN) == tolower(region)) 

plot(region_shape)

close enough :-)

SanderDevisscher commented 3 years ago

@damianooldoni one question though how do I best store/access the Koppen Geiger shapes ? Currently the are saved as shapes in the riparias-prep repository. I can access these via readOGR(resp. githublinks) but I don't know if this is a good long term solution.

timadriaens commented 3 years ago

indeed, setting the country (and perhaps also the continent as many horizon scans are made at continental level) - I am forgetting the obvious :-)

SanderDevisscher commented 3 years ago

(and perhaps also the continent as many horizon scans are made at continental level)


library(rworldmap)
worldmap <- getMap(resolution = "low")
region <- "AfriCa"
if(tolower(region) %in% c("south america", "asia", "africa", "europe", "australia", "antarctica", "north america"){
region_shape <- subset(worldmap, tolower(worldmap$REGION) == tolower(region)) 
}

plot(region_shape)


![image](https://user-images.githubusercontent.com/16779320/117472444-2dc88c00-af59-11eb-8a37-ffb557cf7592.png)

damianooldoni commented 3 years ago

A package should be self contained. Based on the R Packages book I often consult, I would add the shapefiles to the package (as .RData). In this way, as we have in package DESCRIPTION, option LazyData: true, the shapefile will be lazily loaded. This means that it won't occupy any memory until you use it.

@SanderDevisscher: could you please tell me exactly which shapefiles the function will need (directly or indirectly)? Then I can already add them to the package. Thanks. In this way, you can just use them, exaclyt like you can use matcars when loading dplyr :smile:

SanderDevisscher commented 3 years ago

@SanderDevisscher: could you please tell me exactly which shapefiles the function will need (directly or indirectly)? Then I can already add them to the package. Thanks. In this way, you can just use them, exaclyt like you can use matcars when loading dplyr 😄

@damianooldoni currently we use the shapes in https://github.com/inbo/riparias-prep/tree/master/data/spatial/koppen-geiger. I can transform these into .geojson if needed. However we are now discussing whether or not these 2 match sufficiently accurately (see https://github.com/inbo/riparias-prep/discussions/19). This can result in the need to include more shapes in the future.

damianooldoni commented 3 years ago

Transformation in .geojson is not needed as we will save them in trias package as R objects (.Rdata). I have just now checked adding perimeter_shape shapefile as sf data.frame and it worked flawlessly:

class(trias::perimeter_shape)
#> [1] "sf"         "data.frame"
nrow(trias::perimeter_shape)
#> [1] 3

^{Created on 2021-05-10 by the reprex package (v2.0.0)}

I would use as much as possible sf instead of "old" and less maintained sp package, unless some functionalities have still not be rewritten to support sf (not all pkgs maintainers updated their functions to work with sf as well). But this is a minor issue, as we can save sp objects as .Rda files as I have just now tested with sf objects.

So, if I well understand, it's better to wait until https://github.com/inbo/riparias-prep/discussions/19 is solved before adding objects to trias package?

SanderDevisscher commented 3 years ago

@damianooldoni I'm more familiar with the sp package.

So, if I well understand, it's better to wait until inbo/riparias-prep#19 is solved before adding objects to trias package?

~I'm very uncertain this will go anywhere since I can't access the website containing these other shapes.~ UPDATE: A work arround has been found waiting for solution is advisory

damianooldoni commented 3 years ago

great :-) Just let me know in this issue when and what I should add (I can sf object add.:you can always translate them in sp in the function. transforming to sf can be done in a later stage when I have more time.

SanderDevisscher commented 3 years ago

@damianooldoni an update. I've rewritten the climate matching flow according to https://github.com/inbo/riparias-prep/issues/23. Its currently in review see https://github.com/inbo/riparias-prep/pull/26. When this PR is approved I'll start with the function.

damianooldoni commented 3 years ago

Thanks @SanderDevisscher for the update. Do you know also which shapefiles should be added to the package based on the last climate matching workflow?

SanderDevisscher commented 3 years ago

@damianooldoni that would be these shapes: https://github.com/inbo/riparias-prep/tree/23_New_Scenarios/data/spatial/koppen-geiger/future https://github.com/inbo/riparias-prep/tree/23_New_Scenarios/data/spatial/koppen-geiger/observed

SanderDevisscher commented 3 years ago

@damianooldoni since the PR has been approved I'll start working on the function. @timadriaens I'll leave the plotting of maps for now and start with creating the function according to this flow chart. Notice I plan export the occurrence data as spatialpoints this should allow easy plotting on a map after the function has done its work.

CM_function_flow

@damianooldoni did you upload the shapes in a branch ? ifso which branch ? shall I continue in this branch?

damianooldoni commented 3 years ago

@SanderDevisscher: no, still not. Maybe you can start a new branch where you can add the shapefiles as Rdata as first commit? How to describe them? Via a data.R file. Get inspired by this example, or this one . Notice how you can indeed split data documentation in different file, but all of them starting with data (in dplyr you have data-bands.R, data-starwars.R, etc. . We can split ours in data-observed.R and data-future.R if there a lot of shapefiles to be described.

The .rda files should be placed in ./data/ folder.

SanderDevisscher commented 3 years ago

73_climate_matching_function - branch

damianooldoni commented 3 years ago

as you wish. I am not very picky about branch names :smile:

SanderDevisscher commented 3 years ago

it is just a headsup I'll be working on this branch ;-)

SanderDevisscher commented 3 years ago

@damianooldoni I'm having an issue downloading occurrence data from gbif

Error in occ_download_get(set1, overwrite = TRUE) : 
  response content-type != application/octet-stream; q=0.5

do you have any idea how to solve this ?

damianooldoni commented 3 years ago

@SanderDevisscher : about https://github.com/trias-project/trias/issues/73#issuecomment-844843053 could you give me more details about the error and what set1 is?

SanderDevisscher commented 3 years ago

@damianooldoni set1 is the list of predicates to setup the download. In this case:

taxonkey_set1 <- pred_in("taxonKey", taxonkey)

set1 <- occ_download(taxonkey_set1, 
                       pred("hasCoordinate", TRUE),
                       user = gbif_user, 
                       pwd = gbif_pwd, 
                       email = gbif_email)

I get the error when running:

 repeat{
    Sys.sleep(time = 5)
    test_set1 <- occ_download_meta(set1)
    if(test_set1$status == "SUCCEEDED"){
      data <- occ_download_get(set1,
                               overwrite = TRUE) %>% 
        occ_download_import()
      break()
    }
    print(test_set1$status)
  }

also when I only run this part:

data <- occ_download_get(set1,
                               overwrite = TRUE) %>% 
        occ_download_import()

damianooldoni commented 3 years ago

I see this on https://www.gbif.org/health

This could be the reason.

SanderDevisscher commented 3 years ago

Highly likely

damianooldoni commented 3 years ago

@SanderDevisscher: I have improved documentation and structure of Rda climate shapes data in https://github.com/trias-project/trias/commit/dcb01b310d42149e43eb338405e0eff223c38eb0. The main issue is that the name you mention in ./R/data.R should be equal to the data.frame within the Rda file. So, I found the best way to solve this by making a list of climate shapes. In this way, observed.Rda contains an object, a list, called observed with 5 slots and each slot is a spatial dataframe. You can unlist it before using it, or benefit of the list structure if you like. Same for future and legend. I improved the name of a legend slot: now both legends are called like the models they refer to: A1FI and Beck. Personally, I would remove also the prefix KG_ (what does it mean???) and the suffix _legend (not needed and present only in one of the legends).

I also improved the documentation: notice how you can document three datasets in one file by means of @name for the first dataset and @rdname for the other ones, so linking to documentation of the first one. By using @family we can automatically link documentation of the functions you are going to add to this documentation as well. Just add same @family field in function doc.

One question: is the overlapping in observed data correct? I see a shapefile 1976-2000 and a shapefile 1980-2016. Is this done to have the same duration, 25 years? Thanks.

SanderDevisscher commented 3 years ago

Personally, I would remove also the prefix KG_ (what does it mean???) and the suffix _legend (not needed and present only in one of the legends).

The KG comes from Köppen–Geiger climate classification system see https://en.wikipedia.org/wiki/K%C3%B6ppen_climate_classification.

One question: is the overlapping in observed data correct? I see a shapefile 1976-2000 and a shapefile 1980-2016. Is this done to have the same duration, 25 years? Thanks.

@damianooldoni the shapefile from 1976 - 2000 (and prior) are from Rubel & Kottek 2010 while the one from 1980-2016 is the observed climate according to Beck et al. We've decided to use the 1980-2016 - shape for the observations after 2000, mainly because the 2001 - 2025 shape from Rubel & Kottek 2010 (in the future datapack) is a simulation, not a observation.

damianooldoni commented 3 years ago

Thanks @SanderDevisscher for clarifying this. I have to fine tune the documentation then as this is not true. About KG in legend, of course :facepalm: I will just have to add Köppen and Geiger in the legend documentation.

damianooldoni commented 3 years ago

Voila. Done. Please, give a look. If it's ok, we can move on :partying_face:

SanderDevisscher commented 3 years ago

After discussion with @timadriaens we decided to try to add 2 map outputs see updated flowchart: CM_function_flow

timadriaens commented 3 years ago

indeed @SanderDevisscher, well done. Good to have some map outputs showing suitable area for the species (and with the realised niche also) under current and future climate, and also a map which uses the %overlap in the legend as a degree of climatic suitability.

timadriaens commented 3 years ago

The idea was to corroborate the workflow using a set of species currently under PRA for Europe, and for which detailed species distribution models have been drafted to assess climatic and environmental suitability. It would therefore be nice to test it on this set of species:

Species	GbifKey
Marisa cornuarietis	2292586
Cherax destructor	4417558
Mulinia lateralis	2286388
Obama nungara	8701771
Bipalium kewense	2502938
Pycnonotus jocosus	2486151
Cervus nippon	2440954
Delairea odorata	3134183

can you try run the matching?

SanderDevisscher commented 3 years ago

@timadriaens this are the outputs of the current version of the function for the specieslist you provided. These are the settings used:

climate_match(region = "belgium", taxonkey = c(2292586,
                                                       4417558,
                                                       2286388,
                                                       8701771,
                                                       2502938,
                                                       2486151,
                                                       2440954,
                                                       3134183),
                      zipfile)

_Notice: not setting n_totaal, perc_climate, coord_unc, BasisOfRecord results in no respective filters being applied._

cm_test.zip

timadriaens commented 3 years ago

OK tx, useful! Can you trial the same workflow at the EU level?

SanderDevisscher commented 3 years ago

Rerun at European (geographic europe) level

cm_test_europe.zip

timadriaens commented 3 years ago

Tx, great! I included it in the PRA (let's see if they like it...): "Using the climate matching methodology of Faulkner et al. (2014), and using the available validated distribution data (N = 7,208) of P. cafer in the native and invasive range gathererd for the species distribution model (Annex VIII), we determined the degree (%) overlap with Köppen Geiger climate classification classes (Beck et al. 2018) present in the risk assessment area. This shows that under current climate (2001-2025), there are similarly suitable climates for P. jocosus in the risk assessment area, primarily humid subtropical climate (Cfa, 36% overlap), temperate oceanic climate (Cfb, 16% overlap), to a lesser extent other climates such as hot-summer Mediterranean climate (Csa, 3% overlap) which are present in a number of EU Member States (cf. https://www.plantmaps.com/koppen-climate-classification-map-europe.php). "

timadriaens commented 3 years ago

It would be useful to include the names of the climate classes on top of the codes in the outputs and to provide map outputs as we discussed (using a colour scaling representing the % overlap). In any case: straightforward, but really great work!

timadriaens commented 3 years ago

@SanderDevisscher quick cross-check with the results of a detailed SDM: Croatia, Italy and Greece do have Cfa climates so that is corroborated. Portugal and Spain have considerable area of Csa climates. So it looks like the rough climate matching is refined with the SDM but doesn't contradict it, which is great. Here is for instance the predicted suitability under current climate from the SDM, it would be nice to be able to compare this with some map outputs from our crude climate matching workflow

SanderDevisscher commented 3 years ago

we determined the degree (%) overlap with Köppen Geiger climate classification classes (Beck et al. 2018) present in the risk assessment area.

should be: we determined the degree (%) overlap with Köppen Geiger climate classification classes (Rubel & Kottek 2010, Beck et al. 2018) present in the risk assessment area.

Since we used both Rubel & Kottek (observations prior to 2001) & Beck (observations since 2001). Both are updates of the original Köppen Geiger maps.

timadriaens commented 3 years ago

tx! updated...

SanderDevisscher commented 3 years ago

@damianooldoni can you update these files in the legend.rda ?

KG_Beck_Legend.txt KG_A1FI.txt

SanderDevisscher commented 3 years ago

@timadriaens this is a first draft of current climate matching map for Marisa cornuarietis (Linnaeus, 1758) version 1: na.color = "#e0e0e0"

version 2: na.color = "#053061"

which do you prefer ?

timadriaens commented 3 years ago

the first one looks awesome (for sure the mainland background should be pale/white). Can we make sure we use a colour blind proof palette? Also, if the sea is blue (which I prefer), we should perhaps not use blue for suitable area. How come there is no red on the map?

And a final thought: the output is a table with KG and %suitable, and a map showing where that is, but still you cannot link the table with the map (i.e. you don't know where the KG climate classes are and with which colours they correspond). Would there be a way to visualize the KG class on the map perhaps?

damianooldoni commented 3 years ago

@SanderDevisscher: while working on updating legend.rda as asked in https://github.com/trias-project/trias/issues/73#issuecomment-867524195, I see that the structure of KG_A1FI.txt is quite different and there are no column names anymore. How should I interpret this file? The structure of both legends up to now is to provide two columns, GRIDCODE and Classification. If this changes, please let me know something more so that I can update the documentation. Thanks a lot.

SanderDevisscher commented 3 years ago

@timadriaens I updated the current climate map to work with multiple species. I also added a popup and changed the color of the sea, see below:

SanderDevisscher commented 3 years ago

@SanderDevisscher: while working on updating legend.rda as asked in #73 (comment), I see that the structure of KG_A1FI.txt is quite different and there are no column names anymore. How should I interpret this file? The structure of both legends up to now is to provide two columns, GRIDCODE and Classification. If this changes, please let me know something more so that I can update the documentation. Thanks a lot.

Somehow the export got corrupted !! I'll look into it.

SanderDevisscher commented 3 years ago

I used write_rds for KG_A1FI :-/

should be fixed now KG_A1FI.txt

SanderDevisscher commented 3 years ago

@timadriaens I finished both the maps you requested. The function now also exports 1 map displaying the current climate match in the "current_map" - slot & a list of future scenario climate matching maps in the "future_maps" - slot.

SanderDevisscher commented 3 years ago

In dit voorbeeld de climate matching voor de periode 2076 - 2100 van de goldenhorn

SanderDevisscher commented 3 years ago

@damianooldoni This function is allmost ready for review. Just the legend files need to be updated.

SanderDevisscher commented 3 years ago

@timadriaens Do have some time to look at the function in the coming days ?

damianooldoni commented 3 years ago

Hi @SanderDevisscher: legend rda file is updated 👍 I also improved the documentation of the rda files: I split such documentation in three different files as it improved a lot the readibility. I made some basic maintenance of the package as well. Notice also that you can use normal Markdown syntax to write package documentation in R since a year or so. This is enabled by line Roxygen: list(markdown = TRUE) in DESCRIPTION.

trias-project / trias

create function for climate matching #73