inbo / riparias-prep

Preparatory scripts and data management for the RIPARIAS workflow
MIT License
0 stars 1 forks source link

Create climate matching flow for the plants #8

Closed SanderDevisscher closed 3 years ago

SanderDevisscher commented 3 years ago

Like for the crayfish a similar flow should be created for the invasive plants.

@timadriaens is this also @adrienlatli expertise or should I include someone else ?

The following steps should be undertaken (preferably in order) :

adrienlatli commented 3 years ago

@SanderDevisscher : For invasive plants the workflow should be more or less the same. Etienne branquart (SPW) and Antoine Dumortier (student) will drive this study for the SPW. For the Crayfish, you could invite in Github romain Willput (student), he has created a github account (https://github.com/RomainWilleput) Thx

SanderDevisscher commented 3 years ago

@SanderDevisscher : For invasive plants the workflow should be more or less the same. Etienne branquart (SPW) and Antoine Dumortier (student) will drive this study for the SPW.

Can you provide the usernames of Etienne & Antoine ?

For the Crayfish, you could invite in Github romain Willput (student), he has created a github account (https://github.com/RomainWilleput)

@RomainWilleput should have had an invitation for this repo.

@timadriaens is there a reason why you unassigned us both ? No input is required ?

timadriaens commented 3 years ago

@SanderDevisscher no, I made a mistake probably while trying to assign Romain :-)

SanderDevisscher commented 3 years ago

@SanderDevisscher no, I made a mistake probably while trying to assign Romain :-)

Ok I'll reassign us both. Before we can assign Romain (or Etienne & Antoine for that matter) to a issue they first need to accept the invitation.

adrienlatli commented 3 years ago

The github account of Antoine : https://github.com/Andumort

SanderDevisscher commented 3 years ago

The github account of Antoine : https://github.com/Andumort

He has received an invitation

SanderDevisscher commented 3 years ago

Can someone provide me with a white-list ? Or at least the families of plants I should investigate?

Andumort commented 3 years ago

Hi @SanderDevisscher, Here is the Google Sheet link of the plant species list that I made from the catalogs of several Belgian, French, German and Dutch wholesalers :

https://docs.google.com/spreadsheets/d/18HNS3HCNbhvTWIn-6iJHQXsPQbfBtOnNZHi7vZRyYDk/edit?usp=sharing

There is a column with the GBIF name and one with the GBIF taxon ID (if available). Let me know if you have access to it.

Andumort commented 3 years ago

Ups I did not allow access to all, this one should work :

https://docs.google.com/spreadsheets/d/18HNS3HCNbhvTWIn-6iJHQXsPQbfBtOnNZHi7vZRyYDk/edit?usp=sharing

SanderDevisscher commented 3 years ago

@Andumort I can access the file. Just a few questions:

  1. Can you give a header to every column (see D)?
  2. Can you rename column E to taxonKey ?
  3. w'll be limited to those species which have a taxonKey for the climate matching (default and most robust way to download data), is this a problem ?
SanderDevisscher commented 3 years ago

8_plants_cm - branch

SanderDevisscher commented 3 years ago

gbif download using taxonKeys in gsheet image

63 species introduced in Belgium and 190 species not yet introduced

SanderDevisscher commented 3 years ago

Without thresholds the climate matching results in 143 species for the future scenario & 121 species for the current one.

Its up to you guys now to determine logical thresholds.

Andumort commented 3 years ago

Hello @SanderDevisscher,

As Romain just tell you, the CM script looks good. You can use the same one for plants.

We finally managed to get the missing taxonKey. Could you rerun the CM with this final list (wich have the same link as before) : https://docs.google.com/spreadsheets/d/18HNS3HCNbhvTWIn-6iJHQXsPQbfBtOnNZHi7vZRyYDk/edit?usp=sharing

As for Crayfish, could you send me the entire list of the results, with the total number of occurrences, even for species without maching? In this way everythings will be as transparent as possible for us.

Also, I observed a problem with the species "Pontederia cordata", wich is a species that could be a potential candidate for the European List and which have a lot of occurrences in GBIF (Belgium included). This species does not appear in the final results. I think the problem come from the "occurrence statut" on GBIF, wich seems to have a problem with the "present" one. Could you confirm this ? If this is the case, do you see a way to avoid this for other species that might be in the same situation ?

Thanks a lot for all of this, Have a nice day, Antoine

SanderDevisscher commented 3 years ago

We finally managed to get the missing taxonKey. Could you rerun the CM with this final list (wich have the same link as before) : https://docs.google.com/spreadsheets/d/18HNS3HCNbhvTWIn-6iJHQXsPQbfBtOnNZHi7vZRyYDk/edit?usp=sharing

I'll do a rerun soon

SanderDevisscher commented 3 years ago

@Andumort I fixed the flow like I did for the crayfish. These are the results for the plants CM:

when using the same thresholds as the Crayfish:

It's up to you guys to determine if the thresholds should be amended.

Andumort commented 3 years ago

Hi @Sander, Thank you for all of this.

We have just discussed with Adrien, Etienne and Romain. If we are not mistaken, the species which present at least one observation in Belgium are excluded from the analysis?

We think that we'll not necessarily exclude all the species which present an observation in Belgium. Could you therefore rerun the analysis with the entirety of the 202 species in the link below :

https://docs.google.com/spreadsheets/d/18HNS3HCNbhvTWIn-6iJHQXsPQbfBtOnNZHi7vZRyYDk/edit?usp=sharing

You can send me the results for all species, without thresholds, we'll fix them ourselves afterwards.

Thank you so much !

SanderDevisscher commented 3 years ago

@Andumort I'm currently working on a Trias Package function for climate matching see #73

This function does not exclude observations made in the specified region (ea Belgium). Also the unfiltered list of climate matched species is a output.

When this function is completed I'll use it to rerun the plants cm. However I'm currently very busy so its not for any time soon.

Andumort commented 3 years ago

Hello @SanderDevisscher,

Do you think that the CM of the missing species (with occurrences in Belgium) can be carried out in the following days? Indeed, I need the values and climatic zones for these species otherwise I cannot complete the risk analyzes and I will therefore not be able to make a coherent work.

I'm sorry to make this urgent but my master thesis must be returned by August 10 at the latest, which doesn't give me much time to complete the analyzes and the writing.

Link to species with GBIF Taxonkey : https://docs.google.com/spreadsheets/d/18HNS3HCNbhvTWIn-6iJHQXsPQbfBtOnNZHi7vZRyYDk/edit?usp=sharing

Thank you very much in advance for your work and your response.

timadriaens commented 3 years ago

Hi @SanderDevisscher, Antoine and me tried to run the script on his side so he could do the analysis himself but something went wrong on the rgbif side. Furthermore, upon using the scipt on my end with the species Acorus gramineus (gbifkey 2873807) the cm returned an N_total of 50, whereas there are many more records on gbif for this species (despite no filtering was on). We therefore gave up and decided to wait for you :-)

I added a file here provided by Antoine with the specie for which climate matching still needs to be performed. It would be good if you could run it for the total list (tab Missing_CM), but in case of time constraints you can focus on the 18 focal species that are most needed (tab Missing CM_Most needed). Missing_ClimateMatching.xlsx

SanderDevisscher commented 3 years ago

I'm going to rework this flow to use the new Trias climate matching function.

Andumort commented 3 years ago

Ok, thanks. I just hope that it will not change the results to much compared to the last ones, I do not have much time anymore :/

SanderDevisscher commented 3 years ago

https://docs.google.com/spreadsheets/d/18HNS3HCNbhvTWIn-6iJHQXsPQbfBtOnNZHi7vZRyYDk/edit?usp=sharing is still correct ?

Andumort commented 3 years ago

Yes !

SanderDevisscher commented 3 years ago

@Andumort When filtering on BasisOfRecord = c("HUMAN_OBSERVATION", "PRESERVED_SPECIMEN", "UNKNOWN") all records for Samolus valerandi subsp. parviflorus (Raf.) Hultén with coordinates are omited => in the current version of the function this results in an error (which I'll fix).

Is it correct/preferable to use the species instead of the subspecies? Samolus valerandi L. ?

Andumort commented 3 years ago

Hi @SanderDevisscher,

The species Samolus valerandi L. is native to Belgium (therefore eliminated from the species to be analyzed). This is why I had kept the subspecies "Samolus valerandi subsp. parviflorus (Raf.) Hultén" present in the trade in this case.

If the number of georeferenced occurrences is too low with the filters set, the species is considered to have too few occurrences to achieve climate matching and is classified in the "data deficient" list. In the vast majority of cases, the species which have very few occurrences on GBIF are also very poorly documented in the literature, which does not allow for the risk analysis anyway.

So taking the species here is pointless, and if the occurrences are too low for the subspecies, that may be a valid result.

SanderDevisscher commented 3 years ago

I've reworked the function so that species without sufficient spatial data are omited from the results.

Andumort commented 3 years ago

Ok great,

When do you think you can send me the results ?

Thank you

SanderDevisscher commented 3 years ago

I hope to have them by this afternoon

Andumort commented 3 years ago

Hi @SanderDevisscher,

Thank you very much ! It looks good for me compared to the last results.

So, if I understood, the new script doesn't eliminate the occurrences < 1950 ?

Could you precise me what are the others change you did, and what are the finals filters/script used, please ? To be sure that I don't say something wrong in my paper. Is it :

Thanks for your work !

SanderDevisscher commented 3 years ago

So, if I understood, the new script doesn't eliminate the occurrences < 1950 ?

Correct, we now filter occurrences < 1901

Could you precise me what are the others change you did, and what are the finals filters/script used, please ? To be sure that I don't say something wrong in my paper. Is it :

occurrenceStatus = "Present"

Yes

basisOfRecord = "Human observation" + "Preserved specimen" + "Unknown" ==> Why those specific record only ?

Yes, To eliminate records with dubious origin and/or location

remove duplicates when a species is spotted on the same day on the same spot

Yes

Combine occurrences of synonyme species with the accepted species

Yes => Thresholds are: number of total records at least 90 percentage of records in Belgian climate zones (future or present) at least 20%

Additionally records that have incorrect locations which cause them to be placed outside of a climate zone are omitted. Similarly records on the fringes of the climate zones can be omitted due to inaccuracies in the climate zone fineness (km squares) which cause them to fall outside the climate zone.

Some further considerations: No limits or consideration was placed on the coordinate uncertainty which allows records to be placed in the wrong climate zone.