trias-project / indicators

📈 Alien species indicators
https://trias-project.github.io/indicators/
MIT License
0 stars 1 forks source link

Indicator: cumulative number of alien species #20

Closed timadriaens closed 5 years ago

timadriaens commented 6 years ago

Description

This indicator measures the trends of all alien species introductions (publication of first observations). At the national level this indicator is useful to measure the trends in the presence/occurrence of alien (and potentially invasive) species and inform decisions to do with prevention of alien species introduction and the management and control of invasive species causing impacts on biodiversity and ecosystems. It is based on the same information for #17 but is an alternative representation in line with international policy indicators on (invasive) alien species such as EU headline indicators, SEBI, Aichi, IPBES.

Data needs and data output are the same as #17.

Visualisation

A lineplot is envisaged with colours for breakdowns. Visualizing uncertainty due to use of time periods is discussed in #18

stijnvanhoey commented 6 years ago

As a reference, the figure looks from the previous indicator report: image

SanderDevisscher commented 6 years ago

Due to lack of habitat information I opted to just create the cumulative sum of alien species. The script is based on earlier work from Stijn (see https://github.com/inbo/natuurindicatoren/blob/master/src/bedreiging-door-nieuwe-uitheemse-diersoorten.Rmd) and Damiano and calculates how many species are present in a given year. Presence for a given year is determined by checking, per species, if the year falls between the startDate (first observation) and endDate (last observation) for that species.

@timadriaens, @qgroom @SoVDH should determine whether to use a line or bar graph English line graph: English line graph

English bar graph: English bar graph

suggestion: When using a bar graph a grouping should occur (for example by decade)

qgroom commented 6 years ago

We only need to start the graph much later. For policy purposes we could start somewhere between 1900 and 1950. It might be interesting to have the curve split into plants, vertebrates, invertebrates and fungi and others.

stijnvanhoey commented 6 years ago

As this is relevant for the other indicators as well, @SanderDevisscher and @Yasmine-Verzelen make sure this is a seperately defined variable, e.g. :

start_year_plot <- 1945
stijnvanhoey commented 6 years ago

@SanderDevisscher and @Yasmine-Verzelen , I also would agree on the year x-scale steps: e.g.

x_scale_stepsize <- 10
scale_x_continuous(breaks = seq(start_year_plot, max(data$startDate), x_scale_stepsize)))

We can include this idea in each indicator graph. @Yasmine-Verzelen make x_scale_stepsize an input argument of your function as well (with default 10).

qgroom commented 6 years ago

10 years would make it a very insenitive indicator. In policy terms 10 years is a long time. I would not go longer than 5 years unless it is absolutely necessary.

stijnvanhoey commented 6 years ago

We are talking about labels only, not aggregations, so that will be ok ;-)

stijnvanhoey commented 6 years ago

compare from 1800: image

versus from 1900: image

versus from 1950: image

@qgroom, fits this with your idea of having it more policy-oriented?

stijnvanhoey commented 6 years ago

@timadriaens can you have a check as well?

qgroom commented 6 years ago

Indeed! @LienReyserhove @timadriaens and I need to work more getting species into the unified checklist. Some of the steep rise is the result of increased prospecting for plants. This will not bo so distinct as we add more taxa.

stijnvanhoey commented 6 years ago

Yesterday, we were a bit stuck with the implementation to easily add the facet-options to the graph. I revisited the implementation and I think we can easily solve this issue by altering the data set, adding individial rows for each year in between the start/end range. By doing so, we can easily count rows (groupby %>% count()), both for all records as well as taking into account the facet/groupby of a category.

The concept is as follows:

test <- data.frame(start = c(1204, 1202, 1201, 1201), 
                   end = c(1208, 1208, 1208, 1208),
                   key = 1:4,
                   group = c("a", "a", "b", "b"))
test <- test %>% 
    rowwise() %>%
    do(year = .data$start:.data$end) %>%
    bind_cols(test) %>% 
    unnest(year)

with the output:

   start   end   key group  year
   <dbl> <dbl> <int> <fct> <int>
 1  1204  1208     1 a      1204
 2  1204  1208     1 a      1205
 3  1204  1208     1 a      1206
 4  1204  1208     1 a      1207
 5  1204  1208     1 a      1208
 6  1202  1208     2 a      1202
 7  1202  1208     2 a      1203
 8  1202  1208     2 a      1204

As such, the problem is a regular counting problem, grouped by year or any other variable you want to have grouped counts, e.g.

test %>%
    group_by(year, group)  %>%
    count()

resulting in

    year group     n
   <int> <fct> <int>
 1  1201 b         2
 2  1202 a         1
 3  1202 b         2
 4  1203 a         1
 5  1203 b         2
SanderDevisscher commented 6 years ago

@stijnvanhoey I'll try to implement this group in the example would be the "facet_column" ?

stijnvanhoey commented 6 years ago

@SanderDevisscher I have some commits almost ready to push, so take a coffee first ;-)

something to think about: How to treat NA for respectively:

SanderDevisscher commented 6 years ago

a discussion for @qgroom and @timadriaens in analogy with #18 I would tend to not take them into account

SanderDevisscher commented 6 years ago

@LienReyserhove while creating the checklist how did you treat established species with NA's in endDate ?

stijnvanhoey commented 6 years ago

@SanderDevisscher update of code is pushed. Remaining issue with second ggplot graph (I added ISSUE in code, but I have to go to meeting now...

SanderDevisscher commented 6 years ago

@stijnvanhoey since your update I get this error: image Most likely its the patchwork package so I disabled it

SanderDevisscher commented 6 years ago

I was able to produce this: Kingdom image

Phylum image

Pathway level1 image

however this is the result for family and other diverse facet columns: image

LienReyserhove commented 6 years ago

@stijnvanhoey and @SanderDevisscher with respect to the NA's in the data:

if NA is in the end date, we used to consider them to be present until the date of the last publication of the checklist, which in the case of MAP is 2018.

However, recently, we agreed to change this approach and to considered these species to be present only in that specific year. E.g. if a species was recorded first in 1982 and has no end date, then eventDate is 1982/1982. Same thing when there's no start_year. We discussed this issue for the Rust fungi checklist, and I suggest to keep this approach for the MAP. @qgroom you agree?

LienReyserhove commented 6 years ago

See also https://github.com/trias-project/uredinales-belgium-checklist/pull/6#issuecomment-379056292

peterdesmet commented 6 years ago

@stijnvanhoey what you suggest in https://github.com/trias-project/pipeline/issues/20#issuecomment-381867702 would also allow to make the chart with line (start) and error line (year) I suggest in https://github.com/trias-project/pipeline/issues/18#issuecomment-380814490.

stijnvanhoey commented 6 years ago

With regard to the NA for endDate; currently indeed not an issue:

> nrow(data %>% filter(is.na(endDate), !is.na(startDate)))
[1] 0

versus:

> nrow(data %>% filter(!is.na(endDate), !is.na(startDate)))
[1] 13438

but to make sure, what do we do in the visualisation after the check for endDate NA when startDate not NA:

stijnvanhoey commented 6 years ago

@peterdesmet I'm not completely sure how your suggestion will exactly turn out, but that is indeed the case. Implementation is now refactored and @SanderDevisscher made sure it works with the subplots as well in the meanwhile. So, your uncertainty suggestion seems like an appropriate next step.

qgroom commented 6 years ago

I will work towards more of the checklist species having a degreeOfEstablishment so we can use this to make sensible assumptions as to whether something might still exist.

LienReyserhove commented 6 years ago

yes, I think we must change our approach, i.e. we must always integrate information on endDate and startDate in the checklist to avoid confusion. This will depend on the presence of degreeOfEstablishment information, so this need to be checked for each checklist separately. To be continued...

stijnvanhoey commented 6 years ago

@SanderDevisscher fixed that issue, see latest commit on the branch

stijnvanhoey commented 6 years ago

@LienReyserhove, in the meanwhile this commit provides warning messages to thhe function to inform the user if these dates are missing.

damianooldoni commented 5 years ago

This issue can be closed as indicator is working well.