inbo / alien-species-portal

Portal for alien and invasive species indicators
MIT License
0 stars 0 forks source link

Observations stop at 2022 #69

Closed SanderDevisscher closed 8 months ago

SanderDevisscher commented 9 months ago

Describe the bug Observations stop at 2022 while "df_timeseries.Rdata" contains data also for 2023. At least "df_ts" at the end of 05_occurrence_indicators_preprocessing.Rmd does. For example Oryctolagus cuniculus (taxonKey 2436940) has at least an observation in grid cell 1kmE3843N3089 for 2023.

To Reproduce Steps to reproduce the behavior:

  1. Build app in docker
  2. Click on 'global indicators'
  3. filter taxa for 'Oryctolagus cuniculus'
  4. See part of the error ea last observed == 2022 while it should be 2023
  5. click on observations
  6. See other part of the error ea observations go to 2022 while they should go to 2023.

Expected behavior Observations & last observed should be data driven and based on the occurrence cubes (ea df_timeseries). In case of 'Oryctolagus cuniculus' & many more like it this should be 2023.

Screenshots image image image

mvarewyck commented 9 months ago

For the main table, it is based on the file "data_input_checklist_indicators.tsv" which seems to have last observed year 2022:

> exotenData[grepl("Oryctolagus cuniculus", exotenData$scientificName), c("scientificName", "last_observed", "locality")]
                           scientificName last_observed
1: Oryctolagus cuniculus (Linnaeus, 1758)          2022
2: Oryctolagus cuniculus (Linnaeus, 1758)          2022
3: Oryctolagus cuniculus (Linnaeus, 1758)          2022
4: Oryctolagus cuniculus (Linnaeus, 1758)          2022
                         locality
1:                         België
2: Brussels Hoofdstedelijk Gewest
3:                     Vlaanderen
4:                       Wallonië

For the histogram, it is based on the file "be_alientaxa_cube.csv", which seems to have last observations in 2021:

> rawData = fread(file.path("~/git/alien-species-portal/dataS3", "be_alientaxa_cube_aspbo.csv"), stringsAsFactors = FALSE, na.strings = "", drop = "min_coord_uncertainty")
> table(rawData[rawData$taxonKey == 2436940, "year"])
year
1912 1930 1931 1933 1934 1936 1937 1938 1939 1940 1945 1947 1948 1949 1950 1951 
   2    3    1    2    3    8    7   11   21   13    7    3   16   11    5    2 
1952 1953 1954 1960 1961 1967 1975 1976 1981 1982 1984 1985 1986 1987 1988 1989 
   1    4    2    3    3    2    2    9    1   10    5   11   36    9    2   10 
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 
  36   18   40   84  104  446  267  342  377  280  208  182  217    5    5    7 
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 
  62   32   86  315  287  339  314  326  427  391  443  411  391  470  544  665 

For the timeseries though, we have indeed observations in 2023:

> table(timeseries[timeseries$taxonKey == 2436940 & timeseries$obs >= 1, "year"])

1951 1952 1953 1954 1960 1961 1967 1975 1976 1981 1982 1984 1985 1986 1987 1988 
   2    2    4    2    4    4    2    2   12    2   10    6   11   34    9    2 
1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 
  12   46   17   42   86  103  374  255  325  478  280  210  198  217    5    4 
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 
   5   64   39   79  326  377  516  607  553  837  808  780  880 1088 1366 1512 
2021 2022 2023 
1453 1096 1347 

@SanderDevisscher It looks like these files in https://github.com/inbo/aspbo/tree/uat/data/output/UAT_processing need an update? Data sources

SanderDevisscher commented 9 months ago

For the main table, it is based on the file "data_input_checklist_indicators.tsv" which seems to have last observed year 2022:

> exotenData[grepl("Oryctolagus cuniculus", exotenData$scientificName), c("scientificName", "last_observed", "locality")]
                           scientificName last_observed
1: Oryctolagus cuniculus (Linnaeus, 1758)          2022
2: Oryctolagus cuniculus (Linnaeus, 1758)          2022
3: Oryctolagus cuniculus (Linnaeus, 1758)          2022
4: Oryctolagus cuniculus (Linnaeus, 1758)          2022
                         locality
1:                         België
2: Brussels Hoofdstedelijk Gewest
3:                     Vlaanderen
4:                       Wallonië

The checklist will probably lagg behind the cube since it is not data driven. Maybe we can overwrite the last_observed based on the be_alientaxa_cube.csv.

For the histogram, it is based on the file "be_alientaxa_cube.csv", which seems to have last observations in 2021:

> rawData = fread(file.path("~/git/alien-species-portal/dataS3", "be_alientaxa_cube_aspbo.csv"), stringsAsFactors = FALSE, na.strings = "", drop = "min_coord_uncertainty")
> table(rawData[rawData$taxonKey == 2436940, "year"])
year
1912 1930 1931 1933 1934 1936 1937 1938 1939 1940 1945 1947 1948 1949 1950 1951 
   2    3    1    2    3    8    7   11   21   13    7    3   16   11    5    2 
1952 1953 1954 1960 1961 1967 1975 1976 1981 1982 1984 1985 1986 1987 1988 1989 
   1    4    2    3    3    2    2    9    1   10    5   11   36    9    2   10 
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 
  36   18   40   84  104  446  267  342  377  280  208  182  217    5    5    7 
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 
  62   32   86  315  287  339  314  326  427  391  443  411  391  470  544  665 

I've only updated the timeseries based on the be_alientaxa_cube.csv but never the be_alientaxa_cube.csv itself. I'll include some logic in the timeseries flow to export the cube as well.

For the timeseries though, we have indeed observations in 2023:

> table(timeseries[timeseries$taxonKey == 2436940 & timeseries$obs >= 1, "year"])

1951 1952 1953 1954 1960 1961 1967 1975 1976 1981 1982 1984 1985 1986 1987 1988 
   2    2    4    2    4    4    2    2   12    2   10    6   11   34    9    2 
1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 
  12   46   17   42   86  103  374  255  325  478  280  210  198  217    5    4 
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 
   5   64   39   79  326  377  516  607  553  837  808  780  880 1088 1366 1512 
2021 2022 2023 
1453 1096 1347 

@SanderDevisscher It looks like these files in https://github.com/inbo/aspbo/tree/uat/data/output/UAT_processing need an update? Data sources

SanderDevisscher commented 9 months ago

update

image

The data for 2023 is now included however 2023 cannot be selected

SanderDevisscher commented 9 months ago

I've reverted the be_alientaxa_cube because the spread just isn't correct (see https://github.com/trias-project/occ-cube-alien/issues/44)