Closed timadriaens closed 5 years ago
As a reference, the figure looks from the previous indicator report:
Due to lack of habitat information I opted to just create the cumulative sum of alien species. The script is based on earlier work from Stijn (see https://github.com/inbo/natuurindicatoren/blob/master/src/bedreiging-door-nieuwe-uitheemse-diersoorten.Rmd) and Damiano and calculates how many species are present in a given year. Presence for a given year is determined by checking, per species, if the year falls between the startDate (first observation) and endDate (last observation) for that species.
@timadriaens, @qgroom @SoVDH should determine whether to use a line or bar graph English line graph:
English bar graph:
suggestion: When using a bar graph a grouping should occur (for example by decade)
We only need to start the graph much later. For policy purposes we could start somewhere between 1900 and 1950. It might be interesting to have the curve split into plants, vertebrates, invertebrates and fungi and others.
As this is relevant for the other indicators as well, @SanderDevisscher and @Yasmine-Verzelen make sure this is a seperately defined variable, e.g. :
start_year_plot <- 1945
@SanderDevisscher and @Yasmine-Verzelen , I also would agree on the year x-scale steps: e.g.
x_scale_stepsize <- 10
scale_x_continuous(breaks = seq(start_year_plot, max(data$startDate), x_scale_stepsize)))
We can include this idea in each indicator graph. @Yasmine-Verzelen make x_scale_stepsize
an input argument of your function as well (with default 10).
10 years would make it a very insenitive indicator. In policy terms 10 years is a long time. I would not go longer than 5 years unless it is absolutely necessary.
We are talking about labels only, not aggregations, so that will be ok ;-)
compare from 1800:
versus from 1900:
versus from 1950:
@qgroom, fits this with your idea of having it more policy-oriented?
@timadriaens can you have a check as well?
Indeed! @LienReyserhove @timadriaens and I need to work more getting species into the unified checklist. Some of the steep rise is the result of increased prospecting for plants. This will not bo so distinct as we add more taxa.
Yesterday, we were a bit stuck with the implementation to easily add the facet-options to the graph. I revisited the implementation and I think we can easily solve this issue by altering the data set, adding individial rows for each year in between the start/end range. By doing so, we can easily count rows (groupby %>% count()
), both for all records as well as taking into account the facet/groupby of a category.
The concept is as follows:
test <- data.frame(start = c(1204, 1202, 1201, 1201),
end = c(1208, 1208, 1208, 1208),
key = 1:4,
group = c("a", "a", "b", "b"))
test <- test %>%
rowwise() %>%
do(year = .data$start:.data$end) %>%
bind_cols(test) %>%
unnest(year)
with the output:
start end key group year
<dbl> <dbl> <int> <fct> <int>
1 1204 1208 1 a 1204
2 1204 1208 1 a 1205
3 1204 1208 1 a 1206
4 1204 1208 1 a 1207
5 1204 1208 1 a 1208
6 1202 1208 2 a 1202
7 1202 1208 2 a 1203
8 1202 1208 2 a 1204
As such, the problem is a regular counting problem, grouped by year or any other variable you want to have grouped counts, e.g.
test %>%
group_by(year, group) %>%
count()
resulting in
year group n
<int> <fct> <int>
1 1201 b 2
2 1202 a 1
3 1202 b 2
4 1203 a 1
5 1203 b 2
@stijnvanhoey I'll try to implement this group in the example would be the "facet_column" ?
@SanderDevisscher I have some commits almost ready to push, so take a coffee first ;-)
something to think about: How to treat NA
for respectively:
startDate
-> not taking into account?endDate
-> just take now()
?a discussion for @qgroom and @timadriaens in analogy with #18 I would tend to not take them into account
@LienReyserhove while creating the checklist how did you treat established species with NA's in endDate ?
@SanderDevisscher update of code is pushed. Remaining issue with second ggplot graph (I added ISSUE
in code, but I have to go to meeting now...
@stijnvanhoey since your update I get this error: Most likely its the patchwork package so I disabled it
I was able to produce this: Kingdom
Phylum
Pathway level1
however this is the result for family and other diverse facet columns:
@stijnvanhoey and @SanderDevisscher with respect to the NA's in the data:
if NA
is in the end date, we used to consider them to be present until the date of the last publication of the checklist, which in the case of MAP is 2018.
However, recently, we agreed to change this approach and to considered these species to be present only in that specific year. E.g. if a species was recorded first in 1982 and has no end date, then eventDate is 1982/1982. Same thing when there's no start_year. We discussed this issue for the Rust fungi checklist, and I suggest to keep this approach for the MAP. @qgroom you agree?
@stijnvanhoey what you suggest in https://github.com/trias-project/pipeline/issues/20#issuecomment-381867702 would also allow to make the chart with line (start
) and error line (year
) I suggest in https://github.com/trias-project/pipeline/issues/18#issuecomment-380814490.
With regard to the NA for endDate; currently indeed not an issue:
> nrow(data %>% filter(is.na(endDate), !is.na(startDate)))
[1] 0
versus:
> nrow(data %>% filter(!is.na(endDate), !is.na(startDate)))
[1] 13438
but to make sure, what do we do in the visualisation after the check for endDate NA when startDate not NA:
@peterdesmet I'm not completely sure how your suggestion will exactly turn out, but that is indeed the case. Implementation is now refactored and @SanderDevisscher made sure it works with the subplots as well in the meanwhile. So, your uncertainty suggestion seems like an appropriate next step.
I will work towards more of the checklist species having a degreeOfEstablishment so we can use this to make sensible assumptions as to whether something might still exist.
yes, I think we must change our approach, i.e. we must always integrate information on endDate and startDate in the checklist to avoid confusion. This will depend on the presence of degreeOfEstablishment information, so this need to be checked for each checklist separately. To be continued...
@SanderDevisscher fixed that issue, see latest commit on the branch
@LienReyserhove, in the meanwhile this commit provides warning messages to thhe function to inform the user if these dates are missing.
This issue can be closed as indicator is working well.
Description
This indicator measures the trends of all alien species introductions (publication of first observations). At the national level this indicator is useful to measure the trends in the presence/occurrence of alien (and potentially invasive) species and inform decisions to do with prevention of alien species introduction and the management and control of invasive species causing impacts on biodiversity and ecosystems. It is based on the same information for #17 but is an alternative representation in line with international policy indicators on (invasive) alien species such as EU headline indicators, SEBI, Aichi, IPBES.
Data needs and data output are the same as #17.
Visualisation
A lineplot is envisaged with colours for breakdowns. Visualizing uncertainty due to use of time periods is discussed in #18