Closed SanderDevisscher closed 3 years ago
Hello @SanderDevisscher Thank you for the analysis, I will check this today ! I think the link for "47 species for the future climate" is not the right one, species are the same as actual climate. Is it normal ?
I think the link for "47 species for the future climate" is not the right one, species are the same as actual climate. Is it normal ?
I added the wrong link. Should be fixed now.
I think the link for "47 species for the future climate" is not the right one, species are the same as actual climate. Is it normal ?
I added the wrong link. Should be fixed now.
Thank you !
Which filters did you use for occurrences selection ? Only "Human observation", "Material sample" and "Machine observation" ? Any time limitation (>=1950 ?) ? Or something else ? Thx in advance
Which filters did you use for occurrences selection ?
I withheld all records which have an eventdate >=1950, have coordinates, are either "Human observation", "Material sample" or "Machine observation" and have occurrence status equal to "present".
After the cm I filtered those species which have at least 90 records and 20% of the records in belgian climate zones. https://github.com/inbo/riparias-prep/blob/66e00d28e81eac028b3156e964c49892e75d6b55/scripts/crayfish_koppen_geiger_matching.Rmd#L244 & https://github.com/inbo/riparias-prep/blob/66e00d28e81eac028b3156e964c49892e75d6b55/scripts/crayfish_koppen_geiger_matching.Rmd#L247-L248
Thank you for this. So your criteria are :
Am I right ?
When I check some species, I always find a little bit more occurrences than the n_totaal present in your results. Is it another filter ? Or maybe are you removing the duplicates in the gps coordinates occurrences ?
Or maybe are you removing the duplicates in the gps coordinates occurrences ?
Indeed I do a removal of the duplicates of coordinates see https://github.com/inbo/riparias-prep/blob/66e00d28e81eac028b3156e964c49892e75d6b55/scripts/crayfish_koppen_geiger_matching.Rmd#L152.
This in fact causes the n_total & n_climate to be non-sensical I'll have to rework the flow to not do this step untill after the climate matching
However I'll still consider a record @ the same gps coordinates on the same day as the same.
Or maybe are you removing the duplicates in the gps coordinates occurrences ?
Indeed I do a removal of the duplicates of coordinates see
. This in fact causes the n_total & n_climate to be non-sensical I'll have to rework the flow to not do this step untill after the climate matching
Maybe it is an issue with removing duplicates because sometimes the gps coordinates are the same but the date is different, and I think it's still counted as a duplicate, and deleted. I checked for Faxonius neglectus, I found 114 occurrences matching the criteria, 111 occurrences after removing the duplicates on the same date, and finally 107 occurrences after removing the duplicates on different dates (107 as your n_totaal). Is it possible or maybe am I missing something ?
@RomainWilleput I've rewritten the distinct at L152 to remove duplicates when a species is spotted on the same day on the same spot instead of when a species is spotted on the same spot anytime. This results in 2 extra species in the future scenario and 1 extra in the current scenario getting past the thresholds.
Thank you ! I noticed that Cherax destructor, a usual suspect, disappeared from the future climate list. It may be the Perc_Climate which drops below 20%. Do you have the results for this species?
I noticed that Cherax destructor, a usual suspect, disappeared from the future climate list. It may be the Perc_Climate which drops below 20%. Do you have the results for this species?
Currently I only export the list of species which pass the thresholds. I'll add an intermediate export with the unfiltered species list.
@RomainWilleput I've added an intermediate export of the species with at least 1 record in the Belgian climate zones before they are filtered by the thresholds.
@RomainWilleput I've added an intermediate export of the species with at least 1 record in the Belgian climate zones before they are filtered by the thresholds.
Thank you ! One last question, about thresholds : Is 90 occurrences the minimal value to obtain a correct operation of the CM analysis, or is it arbitrarily fixed?
Is 90 occurrences the minimal value to obtain a correct operation of the CM analysis, or is it arbitrarily fixed?
Its an arbitrary limit (see #7)
This cm analysis compromises of a GIS intersect between gbif observations and köppen geiger shapes
Est-ce que 90 occurrences est la valeur minimale pour obtenir une opération correcte de l'analyse CM, ou est-elle arbitrairement fixée?
C'est une limite arbitraire (voir # 7 )
Ok, thank you for your help !
Est-ce que 90 occurrences est la valeur minimale pour obtenir une opération correcte de l'analyse CM, ou est-elle arbitrairement fixée?
C'est une limite arbitraire (voir # 7 )
Ok, thank you for your help !
Can you approve and merge this PR if everything is ok ?
Est-ce que 90 occurrences est la valeur minimale pour obtenir une opération correcte de l'analyse CM, ou est-elle arbitrairement fixée?
C'est une limite arbitraire (voir # 7 )
Ok, thank you for your help !
Can you approve and merge this PR if everything is ok ?
We will discuss this with Adrien and Antoine Monday morning, then I will approve if everything is ok!
Hello @SanderDevisscher I just discussed the CM script with @adrienlatli and @Andumort , almost everything looks good! There just seems to be a little problem with the subspecies that are separated from the main species. I think we are going to lose occurrences this way and that it would be better to group all the subspecies at the taxonomic level of the species For example: Faxonius neglectus + Faxonius neglectus chaenodactylus + Faxonius neglectus neglectus + Orconectes neglectus neglectus (synonyms issue here?) -> Faxonius neglectus or Cambarus bartonii + Cambarus bartonii bartonii + Cambarus bartonii cavatus-> Cambarus bartonii Is it possible to do that?
Another thing, we haven't set the thresholds for n-totaal and perc_climate yet. Is it possible to have the complete list of species of Gbif, including those that do not have occurrences compatible with the Belgian climate, as well as those that just do not have occurrences on Gbif? This would allow us to look at some interesting statistics! Many thanks for your help in all of these analyzes. Have a nice day, Romain
The following additional steps should be taken before approving the PR:
@Andumort I've added some logic to group synonyms/forms/subspecies into species like you asked. I did this by first creating a list containing only a scientificname, recalculated from the genus & specific epithet, and the taxonKey of the accepted species1. Next I merged this list with the data by the rewritten scientificname based on the genus and the specific epithet. This then allows us to recalculate the taxonKey before removing duplicate/incomplete records.
1by filtering the taxonRank for "species" and taxonomicStatus for "accepted".
I've also added a new export containing the taxonKeys of the species on the white list without gbif records. It can be found here: riparias-prep/data/interim/crayfish_missing_from_gbif.csv
Finally I moved the climate match filtering to allow for an export of all species even if they do not match with the Belgian climate.
These changes combined have the following effects. All 592 not - introduced species are now included in the 996 unique climate/species combos for the present scenario and 934 unique climate/species combos for the future scenario. 106 species are missing from gbif. When filtering for a match with the Belgian climate scenarios & arbitrary thresholds the script results in 28 species for the current climate and 53 species for the future climate
Thank a lot Sander, the procedure seems perfect and produce all the information needed. @RomainWilleput : agree with that Romain ?
Hello @SanderDevisscher ,
I have a major question regarding the future scenario analysis: Why does the n_climate of a species for the same climate category change between current and future scenario? For example Current: Astacopsis franklinii Cfb 133 Future: Astacopsis franklinii Cfb 141 Normally all occurrences recorded on Gbif are on the current climate map, and the analysis of the future scenario consists only in comparing these occurrences with the future Belgian climate zones (Cfa and Cfb). Did you use the future climate map (2076-2100) for the Gbif occurrences in the future scenario? If so, could you please explain me why?
A second, less important question: Compared to previous results, it seems that the n_totaal of all species has decreased slightly, while it should have either increased (by merging subspecies) or remained the same. For example Before: Astacopsis franklinii n_totaal = 167 Now: Astacopsis franklinii n_totaal = 148 Do you have any idea what caused this?
Finally, one last question: We selected Gbif occurrences between 1950 and 2021. Is this period compatible with the climate map you are using (1976-2000 or 1980-2016)? Would it be necessary to use an earlier climate map for occurrences before 1976/1980?
Many thanks in advance for your help, Have a nice day!
I have a major question regarding the future scenario analysis: Why does the n_climate of a species for the same climate category change between current and future scenario? For example Current: Astacopsis franklinii Cfb 133 Future: Astacopsis franklinii Cfb 141 Normally all occurrences recorded on Gbif are on the current climate map, and the analysis of the future scenario consists only in comparing these occurrences with the future Belgian climate zones (Cfa and Cfb). Did you use the future climate map (2076-2100) for the Gbif occurrences in the future scenario? If so, could you please explain me why?
@RomainWilleput I did use the future climate map to remap the gbif occurences => this of course is not a good method. I'll fix it by mapping only to the current climate and doing a different climate filter for the different scenarios.
A second, less important question: Compared to previous results, it seems that the n_totaal of all species has decreased slightly, while it should have either increased (by merging subspecies) or remained the same. For example Before: Astacopsis franklinii n_totaal = 167 Now: Astacopsis franklinii n_totaal = 148 Do you have any idea what caused this?
This could be the result of 2 records on the same date on the same location but before the merger of the species of 2 seperate subspecies/synonyms. By merging the species these would be considered duplicates and only 1 is withheld.
We selected Gbif occurrences between 1950 and 2021. Is this period compatible with the climate map you are using (1976-2000 or 1980-2016)? Would it be necessary to use an earlier climate map for occurrences before 1976/1980?
Currently I use the 1980-2016 maps. Whether or not this matches with the climate between 1950 - 2021 I don't know. I'll leave this open for discussion (see #19) . However I think most of the data is from recent years (I'll check this).
I've rewritten the workflow to do climate matching with the present climate only and then filtering according to both the climate scenarios for belgium.
This results in these lists:
The script looks good to me now! It remains to set the Koppen Geiger climate maps according to the dates of the occurrences if possible. Thanks a lot for your help
The script looks good to me now! It remains to set the Koppen Geiger climate maps according to the dates of the occurrences if possible. Thanks a lot for your help
Ok, can you approve & merge this PR ? When I have access to the different Koppen Geiger climate maps I'll implement the logic.
The script looks good to me now! It remains to set the Koppen Geiger climate maps according to the dates of the occurrences if possible. Thanks a lot for your help
Ok, can you approve & merge this PR ? When I have access to the different Koppen Geiger climate maps I'll implement the logic.
I'm quite new on GitHub, is it good like this ?
The script looks good to me now! It remains to set the Koppen Geiger climate maps according to the dates of the occurrences if possible. Thanks a lot for your help
Ok, can you approve & merge this PR ? When I have access to the different Koppen Geiger climate maps I'll implement the logic.
I'm quite new on GitHub, is it good like this ?
Now it is ;-)
fixes #5 fixes #10
This PR adds climate matching, to the crayfish script, based on the families (see https://github.com/inbo/riparias-prep/issues/5#issuecomment-812565701) instead of a white list. This enables climate matching for all crayfish species.
Also I've rewritten the data manipulation to work with the accepted- ScientificName & -TaxonKey columns instead of the species and TaxonKey columns. This should eliminate any issues with synonyms. However to be certain can you check if this is the case, just to be sure.
When doing climate matching for all species (while maintaining the thresholds (see https://github.com/inbo/riparias-prep/issues/7#issuecomment-811718390)) we get 24 species for the current climate and 47 species for the future climate.