Closed damianooldoni closed 4 years ago
Here is the figure we discussed for our memory
During meeting of July, 8, 2019, we decided/thought the following:
gratia
and inspired by Harrison et al. 2014, JAPE)This ends up with a rankings for each indicator, which will be shown as a synoptic table. Here below an example:
species | rank | 2013 | 2014 | 2015 | 2016 | 2017 |
---|---|---|---|---|---|---|
D | 1 | emerging | potentially emerging | emerging | emerging | emerging |
B | 2 | emerging | not emerging | emerging | emerging | emerging |
H | 3 | emerging | emerging | **unclear | emerging | emerging |
W | 4 | emerging | emerging | not emerging | emerging | emerging |
Z | 5 | emerging | emerging | emerging | potentially emerging | emerging |
Y | 6 | emerging | emerging | emerging | emerging | unclear |
S | 7 | emerging | emerging | emerging | emerging | not emerging |
S | 7 | emerging | potentially emerging | emerging | emerging | not emerging |
This way we account for the most recent year being more important in assessing emerging character of a species. As we work with rankings, merging indicators will have to merge rankings, thus producing a new ranking. A strategy could be to calculate the final ranking based on the sum of the two rankings, where minimum wins.
In line with previous thoughts, we want to combine both rankings in one general indicator. Again, reason to work with these two indicators is that a species can increase its AOO without a noticeable increase in occurrences and viceversa.
We start from this table:
species | year | AOO | occurrence |
---|---|---|---|
A | 2017 | emerging | emerging |
A | 2016 | emerging | potentially emerging |
A | 2014 | unclear | potentially emerging |
A | 2015 | not emerging | potentially emerging |
A | 2015 | not emerging | potentially emerging |
B | 2017 | emerging | emerging |
B | 2016 | emerging | unclear |
B | 2015 | potentially emerging | unclear |
B | 2014 | not emerging | potentially emerging |
B | 2013 | not emerging | potentially emerging |
We define ranking based on the labels in the most recent year (2017), then, if same labels occur, we evaluate the second most recent year (2016), etc.
In this way we get just one (!) ranking and we do not need of manipulate rankings.But, at the same time, we can still provide partial rankings based on each indicator as ancillary information. If I should vote now, I would choose this strategy.
I think we can only really evaluate these strategies with real data from known species. If we did this with data from the 20th century can we see what we have now?
How does this approach resolve the issue of slowly mobilized data?
@qgroom : sure. Thanks to @ToonVanDaele we started to work on real data. FOr that reason we could think further. We are working on it and we hope to have some results to discuss with you all in September. We limit analysis to 2017 (next year we will take 2018 too) to avoid the drop down. Correcting by dividing occurrences and AOO with baseline on classis level will be taken into account. In this way, we hope to correct research effort bias as well. Next step, at least for me, is to include @ToonVanDaele 's code in trias package.
@qgroom some graph output we used for the discussion yesterday is available on the TrIAS folder but needs updating (e.g. including sampling bias correction). Some species i would suspect being "emerging" are really flagged :-)
What do you mean with "how is this to solve the issue of slowly mobilized data"?
@damianooldoni @ToonVanDaele re "5. We should also add a kind of warning/label whether an alien species is found in protected areas." This can indeed be done and will be available through another occurrence indicator (see this issue). As we discussed, there are several options:
I don't think we decided how to proceed with that. Occurrence in protected areas is probably very linked to what is outside those areas.
Here are two other usual suspect that are indeed flagged as emerging (but based on the last 3 years)
How does this approach resolve the issue of slowly mobilized data?
I'm forgetting that you are going to correct by class.
To avoid confounding effects, the observations by class shoud not contain observations of the potential invasive species. The observations of invasive species are substracted from the class observations in the pre-processing fase.
Code for ranking (2nd strategy deployed) and snapshot based on 19 species (test data from @ToonVanDaele): https://github.com/ToonVanDaele/trias-test/issues/4.
For tracking purposes. As shown during TrIAS meeting in October, this is the general workflow diagram we are working with:
Emerging status scores assigned by GAM:
In case of score "Unclear" (too low points for applying GAM, 0 within confidence interval of 1st and 2nd derivative), we use a set of decision rules. Based on them, the possible outcomes are:
We don't include taxa with score "appearing/reappearing" for ranking as they are not comparable with other taxa. Discussing bilaterally with @ToonVanDaele and @timadriaens, we thought to put them in another table. We three will discuss more about during our internal meeting on 18 Nov. Results discussed with core team on Monday 25 Nov.
Based on meeting with @timadriaens and @ToonVanDaele today:
Ideas to improve ranking, especially the ranking of the group of the "most" emerging species:
During meeting I had with @ToonVanDaele, Hans Van Calster suggested to run GAM just once and to take the output of the years of interest (in our case 2016, 2017, 2018) instead of running GAM thrice, first on time series up to 2016, second on time series up to 2017 and finally on the time series up to 2018.
If I understood correctly, the reason is that the GAM outputs are not statistically independent so there is no reason to make model such complex and computationally demanding. @ToonVanDaele agreed and he and I immediately implemented this "easier" approach. Still, @ToonVanDaele, we need your help to formulate this concept better in a near future.
Another update about emerging status assessment: we use the lower value of the confidence interval of the growth (1st derivative) of occupancy in 2018 as a way to introduce a ranking among the species considered emerging. As this value is continuous, ex aequo are solved. @ToonVanDaele : I am very curious to see the output list. Meanwhile, I am getting progress in making your code shining in this repo (not in master, working on branch occurrence-indicators).
Entire workflow for occurrence indicators is now online! :tada: Partial emerging statuses based on GAM and decision rules (link) and final ranking (link) have been added. In #70 more details about changes and the work still to be done.
I thought about an alternative ranking strategy which is less strict (less hierarchical) as it is point-based, although it follows same criteria discussed in this issue. You can find the table of the weights (gain factors) in the pipeline at section 4.2. So, please, @timadriaens: up to you to see if this second strategy is preferable.
We use last full three years (108, 2017, 2016) as evaluation years, so the pipeline detecting (re)appearing taxa ahs been slightly changed and takes into account only occurrences of current year.
@timadriaens , @ToonVanDaele: we should also sit together to discuss the results and improve GAM plots without making them too complex. GAM uses the values of native species within same class as covariate: so the smoother in plots doesn't follow the real occurrence/occupancy as part of the growth/decrease is due to the covariate. So, now you could think the GAM results are bad, but actually are good.
@ToonVanDaele : could you please use the functions apply_gam
and em_status_dr
and review them? In the meantime I will move them to trias package adding documentation and rigorous unit-testing. At this stage of development are your suggestions more than welcome. I simplified the decision rules/tree too.
I still maintained the original numbering of @ToonVanDaele for letting you to ease the review process. I will rename them if you think they are good.
See also two different ranking methods, the hierarchical one (which works like in Olympic games) and the point-based system where each partial evaluation contributes (differently) to final score.
Today, based on a parliamentary question and through @timadriaens, I found we don't provide graphs of observations and occupancy in case GAM cannot be used. This is a pity, as we have these data and we should show them even if we cannot add GAM prediction as additional layer. As I am working on adding apply_gam
to trias function, I will solve this at function level.
We had already agreed on that (raw data always needed for interpretation) at the previous meeting see #53 but just did not get to it yet (and did not put it on github perhaps). But indeed, the plan is to show at least the following graphs:
We need this for all species, not only the emerging ones.
It would also be good, per species to have a small data frame with the "emergence" indicators so this is available and can be shown per species. This should make clear whether it's based on decision rules or on GAM.
Other ideas for visualization/graphs/output tables are welcome @damianooldoni @ToonVanDaele
@ToonVanDaele has made some more attractive graphs for the emerging species series of TrIAS Aware in Natuur.focus. Perhaps we can base the layout on those graphs and use his code to that end, bearing in mind we are going to put that information on a website for the public I feel they should look smashing.
examples pulled together with @damianooldoni for Dama dama
and some graphs @ToonVanDaele prepared
@timadriaens , @ToonVanDaele: interesting. As already said we need to sit together one half day and produce graphs so we can find the right output and the best smashing style :+1: At the same time we have to solve also this visulaization issue: https://github.com/ToonVanDaele/trias-test/issues/10. I check your agenda online and make a proposal.
Meanwhile, I will correct baseline data. Up to now we used number of native species instead of using ALL data at class level minus the obs of taxon under examination.
This long issue has been tackled and can be closed. :+1:
This issue describes a part of the general workflow for assessing the emerging status of alien species, as discussed on Friday, 15 Feb 2019 by @damianooldoni , @timadriaens and @ToonVanDaele .
Input data
We start from the output of occ-processing repository called
cube_belgium.csv
as mentioned in https://github.com/trias-project/occ-processing/issues/3. This file contains occurrences with (at least) the following key columns:taxonKey
speciesKey
kingdomKey
year
CELLCODE
(grid id from European Environment)Grouping by
speciesKey
andyear
, we get the number of occurrences per year (x: year, y: n_occs). We work at year level, no more detailed temporary information used. The research effort bias of area of occupancy (AOO) already corrected at this stage (for details about research effort bias correction, see #46). Working at species level can be not always the case, issue discussed separately (see https://github.com/trias-project/unified-checklist/issues/35).AOO and occurrences are time series (x: year, y: occurrences or y: AOO). Although we could have data before 1950, we start analysis from 1950, the birth date of invasion ecology (cit. @timadriaens :smiley: ).
Limit cases
Segmented regression
After extracting the limit cases, we set occ and AOO equal to zero for years with no occurrences as only years with occurrences are present in the cube. Segmented regression will be applied to the AOO and occ time series separately. So, for each of the two time series and for each year, the slope of the last segment and its confidence interval is evaluated as a categorical variable. We can have three situations:
Emerging decision table at year level
For each year and species we can then apply a decision table to define the status of emergency of the species:
This will end up in an output like this:
Next steps: how to aggregate this emerging labels in order to estimate the general emerging status of a species? My two cents: as our analysis is future oriented, the emerging status in the recent past should definitely weight more in the finale decision than the status in the far past.
@ToonVanDaele , @timadriaens : please comment if you think I missed something or you have new thoughts about it.