inbo / riparias-prep

Preparatory scripts and data management for the RIPARIAS workflow
MIT License
0 stars 1 forks source link

5 all crayfish cm #17

Closed SanderDevisscher closed 3 years ago

SanderDevisscher commented 3 years ago

fixes #5 fixes #10

This PR adds climate matching, to the crayfish script, based on the families (see https://github.com/inbo/riparias-prep/issues/5#issuecomment-812565701) instead of a white list. This enables climate matching for all crayfish species.

Also I've rewritten the data manipulation to work with the accepted- ScientificName & -TaxonKey columns instead of the species and TaxonKey columns. This should eliminate any issues with synonyms. However to be certain can you check if this is the case, just to be sure.

When doing climate matching for all species (while maintaining the thresholds (see https://github.com/inbo/riparias-prep/issues/7#issuecomment-811718390)) we get 24 species for the current climate and 47 species for the future climate.

RomainWilleput commented 3 years ago

Hello @SanderDevisscher Thank you for the analysis, I will check this today ! I think the link for "47 species for the future climate" is not the right one, species are the same as actual climate. Is it normal ?

SanderDevisscher commented 3 years ago

I think the link for "47 species for the future climate" is not the right one, species are the same as actual climate. Is it normal ?

I added the wrong link. Should be fixed now.

RomainWilleput commented 3 years ago

I think the link for "47 species for the future climate" is not the right one, species are the same as actual climate. Is it normal ?

I added the wrong link. Should be fixed now.

Thank you !

RomainWilleput commented 3 years ago

Which filters did you use for occurrences selection ? Only "Human observation", "Material sample" and "Machine observation" ? Any time limitation (>=1950 ?) ? Or something else ? Thx in advance

SanderDevisscher commented 3 years ago

Which filters did you use for occurrences selection ?

I withheld all records which have an eventdate >=1950, have coordinates, are either "Human observation", "Material sample" or "Machine observation" and have occurrence status equal to "present".

https://github.com/inbo/riparias-prep/blob/66e00d28e81eac028b3156e964c49892e75d6b55/scripts/crayfish_koppen_geiger_matching.Rmd#L140-L146

After the cm I filtered those species which have at least 90 records and 20% of the records in belgian climate zones. https://github.com/inbo/riparias-prep/blob/66e00d28e81eac028b3156e964c49892e75d6b55/scripts/crayfish_koppen_geiger_matching.Rmd#L244 & https://github.com/inbo/riparias-prep/blob/66e00d28e81eac028b3156e964c49892e75d6b55/scripts/crayfish_koppen_geiger_matching.Rmd#L247-L248

RomainWilleput commented 3 years ago

Thank you for this. So your criteria are :

Am I right ?

When I check some species, I always find a little bit more occurrences than the n_totaal present in your results. Is it another filter ? Or maybe are you removing the duplicates in the gps coordinates occurrences ?

SanderDevisscher commented 3 years ago

Or maybe are you removing the duplicates in the gps coordinates occurrences ?

Indeed I do a removal of the duplicates of coordinates see https://github.com/inbo/riparias-prep/blob/66e00d28e81eac028b3156e964c49892e75d6b55/scripts/crayfish_koppen_geiger_matching.Rmd#L152.

This in fact causes the n_total & n_climate to be non-sensical I'll have to rework the flow to not do this step untill after the climate matching

SanderDevisscher commented 3 years ago

However I'll still consider a record @ the same gps coordinates on the same day as the same.

RomainWilleput commented 3 years ago

Or maybe are you removing the duplicates in the gps coordinates occurrences ?

Indeed I do a removal of the duplicates of coordinates see

https://github.com/inbo/riparias-prep/blob/66e00d28e81eac028b3156e964c49892e75d6b55/scripts/crayfish_koppen_geiger_matching.Rmd#L152

. This in fact causes the n_total & n_climate to be non-sensical I'll have to rework the flow to not do this step untill after the climate matching

Maybe it is an issue with removing duplicates because sometimes the gps coordinates are the same but the date is different, and I think it's still counted as a duplicate, and deleted. I checked for Faxonius neglectus, I found 114 occurrences matching the criteria, 111 occurrences after removing the duplicates on the same date, and finally 107 occurrences after removing the duplicates on different dates (107 as your n_totaal). Is it possible or maybe am I missing something ?

SanderDevisscher commented 3 years ago

@RomainWilleput I've rewritten the distinct at L152 to remove duplicates when a species is spotted on the same day on the same spot instead of when a species is spotted on the same spot anytime. This results in 2 extra species in the future scenario and 1 extra in the current scenario getting past the thresholds.

RomainWilleput commented 3 years ago

Thank you ! I noticed that Cherax destructor, a usual suspect, disappeared from the future climate list. It may be the Perc_Climate which drops below 20%. Do you have the results for this species?

SanderDevisscher commented 3 years ago

I noticed that Cherax destructor, a usual suspect, disappeared from the future climate list. It may be the Perc_Climate which drops below 20%. Do you have the results for this species?

Currently I only export the list of species which pass the thresholds. I'll add an intermediate export with the unfiltered species list.

SanderDevisscher commented 3 years ago

@RomainWilleput I've added an intermediate export of the species with at least 1 record in the Belgian climate zones before they are filtered by the thresholds.

future scenario current scenario

RomainWilleput commented 3 years ago

@RomainWilleput I've added an intermediate export of the species with at least 1 record in the Belgian climate zones before they are filtered by the thresholds.

future scenario current scenario

Thank you ! One last question, about thresholds : Is 90 occurrences the minimal value to obtain a correct operation of the CM analysis, or is it arbitrarily fixed?

SanderDevisscher commented 3 years ago

Is 90 occurrences the minimal value to obtain a correct operation of the CM analysis, or is it arbitrarily fixed?

Its an arbitrary limit (see #7)

SanderDevisscher commented 3 years ago

This cm analysis compromises of a GIS intersect between gbif observations and köppen geiger shapes

RomainWilleput commented 3 years ago

Est-ce que 90 occurrences est la valeur minimale pour obtenir une opération correcte de l'analyse CM, ou est-elle arbitrairement fixée?

C'est une limite arbitraire (voir # 7 )

Ok, thank you for your help !

SanderDevisscher commented 3 years ago

Est-ce que 90 occurrences est la valeur minimale pour obtenir une opération correcte de l'analyse CM, ou est-elle arbitrairement fixée?

C'est une limite arbitraire (voir # 7 )

Ok, thank you for your help !

Can you approve and merge this PR if everything is ok ?

RomainWilleput commented 3 years ago

Est-ce que 90 occurrences est la valeur minimale pour obtenir une opération correcte de l'analyse CM, ou est-elle arbitrairement fixée?

C'est une limite arbitraire (voir # 7 )

Ok, thank you for your help !

Can you approve and merge this PR if everything is ok ?

We will discuss this with Adrien and Antoine Monday morning, then I will approve if everything is ok!

RomainWilleput commented 3 years ago

Hello @SanderDevisscher I just discussed the CM script with @adrienlatli and @Andumort , almost everything looks good! There just seems to be a little problem with the subspecies that are separated from the main species. I think we are going to lose occurrences this way and that it would be better to group all the subspecies at the taxonomic level of the species For example: Faxonius neglectus + Faxonius neglectus chaenodactylus + Faxonius neglectus neglectus + Orconectes neglectus neglectus (synonyms issue here?) -> Faxonius neglectus or Cambarus bartonii + Cambarus bartonii bartonii + Cambarus bartonii cavatus-> Cambarus bartonii Is it possible to do that?

Another thing, we haven't set the thresholds for n-totaal and perc_climate yet. Is it possible to have the complete list of species of Gbif, including those that do not have occurrences compatible with the Belgian climate, as well as those that just do not have occurrences on Gbif? This would allow us to look at some interesting statistics! Many thanks for your help in all of these analyzes. Have a nice day, Romain

SanderDevisscher commented 3 years ago

The following additional steps should be taken before approving the PR:

SanderDevisscher commented 3 years ago

@Andumort I've added some logic to group synonyms/forms/subspecies into species like you asked. I did this by first creating a list containing only a scientificname, recalculated from the genus & specific epithet, and the taxonKey of the accepted species1. Next I merged this list with the data by the rewritten scientificname based on the genus and the specific epithet. This then allows us to recalculate the taxonKey before removing duplicate/incomplete records.

1by filtering the taxonRank for "species" and taxonomicStatus for "accepted".

I've also added a new export containing the taxonKeys of the species on the white list without gbif records. It can be found here: riparias-prep/data/interim/crayfish_missing_from_gbif.csv

Finally I moved the climate match filtering to allow for an export of all species even if they do not match with the Belgian climate.

These changes combined have the following effects. All 592 not - introduced species are now included in the 996 unique climate/species combos for the present scenario and 934 unique climate/species combos for the future scenario. 106 species are missing from gbif. When filtering for a match with the Belgian climate scenarios & arbitrary thresholds the script results in 28 species for the current climate and 53 species for the future climate

adrienlatli commented 3 years ago

Thank a lot Sander, the procedure seems perfect and produce all the information needed. @RomainWilleput : agree with that Romain ?

RomainWilleput commented 3 years ago

Hello @SanderDevisscher ,

I have a major question regarding the future scenario analysis: Why does the n_climate of a species for the same climate category change between current and future scenario? For example Current: Astacopsis franklinii Cfb 133 Future: Astacopsis franklinii Cfb 141 Normally all occurrences recorded on Gbif are on the current climate map, and the analysis of the future scenario consists only in comparing these occurrences with the future Belgian climate zones (Cfa and Cfb). Did you use the future climate map (2076-2100) for the Gbif occurrences in the future scenario? If so, could you please explain me why?

A second, less important question: Compared to previous results, it seems that the n_totaal of all species has decreased slightly, while it should have either increased (by merging subspecies) or remained the same. For example Before: Astacopsis franklinii n_totaal = 167 Now: Astacopsis franklinii n_totaal = 148 Do you have any idea what caused this?

Finally, one last question: We selected Gbif occurrences between 1950 and 2021. Is this period compatible with the climate map you are using (1976-2000 or 1980-2016)? Would it be necessary to use an earlier climate map for occurrences before 1976/1980?

Many thanks in advance for your help, Have a nice day!

SanderDevisscher commented 3 years ago

I have a major question regarding the future scenario analysis: Why does the n_climate of a species for the same climate category change between current and future scenario? For example Current: Astacopsis franklinii Cfb 133 Future: Astacopsis franklinii Cfb 141 Normally all occurrences recorded on Gbif are on the current climate map, and the analysis of the future scenario consists only in comparing these occurrences with the future Belgian climate zones (Cfa and Cfb). Did you use the future climate map (2076-2100) for the Gbif occurrences in the future scenario? If so, could you please explain me why?

@RomainWilleput I did use the future climate map to remap the gbif occurences => this of course is not a good method. I'll fix it by mapping only to the current climate and doing a different climate filter for the different scenarios.

A second, less important question: Compared to previous results, it seems that the n_totaal of all species has decreased slightly, while it should have either increased (by merging subspecies) or remained the same. For example Before: Astacopsis franklinii n_totaal = 167 Now: Astacopsis franklinii n_totaal = 148 Do you have any idea what caused this?

This could be the result of 2 records on the same date on the same location but before the merger of the species of 2 seperate subspecies/synonyms. By merging the species these would be considered duplicates and only 1 is withheld.

We selected Gbif occurrences between 1950 and 2021. Is this period compatible with the climate map you are using (1976-2000 or 1980-2016)? Would it be necessary to use an earlier climate map for occurrences before 1976/1980?

Currently I use the 1980-2016 maps. Whether or not this matches with the climate between 1950 - 2021 I don't know. I'll leave this open for discussion (see #19) . However I think most of the data is from recent years (I'll check this).

SanderDevisscher commented 3 years ago

I've rewritten the workflow to do climate matching with the present climate only and then filtering according to both the climate scenarios for belgium.

This results in these lists:

RomainWilleput commented 3 years ago

The script looks good to me now! It remains to set the Koppen Geiger climate maps according to the dates of the occurrences if possible. Thanks a lot for your help

SanderDevisscher commented 3 years ago

The script looks good to me now! It remains to set the Koppen Geiger climate maps according to the dates of the occurrences if possible. Thanks a lot for your help

Ok, can you approve & merge this PR ? When I have access to the different Koppen Geiger climate maps I'll implement the logic.

RomainWilleput commented 3 years ago

The script looks good to me now! It remains to set the Koppen Geiger climate maps according to the dates of the occurrences if possible. Thanks a lot for your help

Ok, can you approve & merge this PR ? When I have access to the different Koppen Geiger climate maps I'll implement the logic.

I'm quite new on GitHub, is it good like this ?

SanderDevisscher commented 3 years ago

The script looks good to me now! It remains to set the Koppen Geiger climate maps according to the dates of the occurrences if possible. Thanks a lot for your help

Ok, can you approve & merge this PR ? When I have access to the different Koppen Geiger climate maps I'll implement the logic.

I'm quite new on GitHub, is it good like this ?

Now it is ;-)