inbo / alien-species-portal

Portal for alien and invasive species indicators
MIT License
0 stars 0 forks source link

overwrite last_observed in checklist_indicators by alientaxa_cube #70

Open mvarewyck opened 9 months ago

mvarewyck commented 9 months ago

For the main table, it is based on the file "data_input_checklist_indicators.tsv" which seems to have last observed year 2022:

> exotenData[grepl("Oryctolagus cuniculus", exotenData$scientificName), c("scientificName", "last_observed", "locality")]
                           scientificName last_observed
1: Oryctolagus cuniculus (Linnaeus, 1758)          2022
2: Oryctolagus cuniculus (Linnaeus, 1758)          2022
3: Oryctolagus cuniculus (Linnaeus, 1758)          2022
4: Oryctolagus cuniculus (Linnaeus, 1758)          2022
                         locality
1:                         België
2: Brussels Hoofdstedelijk Gewest
3:                     Vlaanderen
4:                       Wallonië

The checklist will probably lagg behind the cube since it is not data driven. Maybe we can overwrite the last_observed based on the be_alientaxa_cube.csv.

Originally posted by @SanderDevisscher in https://github.com/inbo/alien-species-portal/issues/69#issuecomment-1864315018

mvarewyck commented 9 months ago

I had a quick look:

SanderDevisscher commented 9 months ago

I had a quick look:

  • the file be_alientaxa_cube.csv doesn't contain locality information. The timeseries file contains this info, so optionally we copy the last year from there? @SanderDevisscher

The timeseries is based on be_alientaxa_cube so ok for me

  • we need to check whether the newly copied value for last year is actually an improvement: not introducing NA values or overwrite with an earlier year

Indeed we should only overwrite the last observed year when the new year is larger and not NA otherwise maintain the checklist value.

  • we need to make sure the indicators file is created after the helper file for last year is updated. Order in aspbo?

current workflow:

  1. data_input_checklist_indicators.tsv is updated every 1st of the month (see get_griis_checklist.yaml)
  2. PR of 1 tiggers: be_alientaxa_cube to be downloaded from zenodo as part of update_indicators_preprocessing.yaml which also creates the timeseries.
  3. PR of 1 & PR of 2 triggers upload_files_processing.yaml

What do you think needs to change ?

SanderDevisscher commented 9 months ago

offcourse new year should not be in the future as well 😅

mvarewyck commented 8 months ago

What do you think needs to change ?

Workflow looks okay. I will update the code in createTabularData() to incorporate the info from timeseries.

mvarewyck commented 2 months ago

@mvarewyck issue https://github.com/inbo/alien-species-portal/issues/70 seems to be persistent, example: Cyprinus carpio (141117232)

mvarewyck commented 2 months ago

@mvarewyck issue #70 seems to be persistent, example: Cyprinus carpio (141117232)

Current data for Cyprinus carpio has indeed recent observations for some regions, but not for belgium/flanders

> exotenData[exotenData$species == "Cyprinus carpio", c("locality", "nubKey", "scientificName", "first_observed", "last_observed")]
   locality  nubKey                 scientificName first_observed last_observed
     <char>   <int>                         <char>          <int>         <num>
1:   België 4286975 Cyprinus carpio Linnaeus, 1758           1201          1201
2: brussels 4286975 Cyprinus carpio Linnaeus, 1758           2008          2021
3: flanders 4286975 Cyprinus carpio Linnaeus, 1758           1201          1201
4: wallonia 4286975 Cyprinus carpio Linnaeus, 1758           2007          2022

In the timeseries data (where we get more recent 'last observed' dates from), I can't find this species. I think it already includes species introduced before 1950 (see no. of rows) @soriadelva @SanderDevisscher but I can't explain why this species is not there

> dim(timeseries)
[1] 51397730       12
> timeseries[grep("4286975", as.character(timeseries$taxonKey)), ]
Empty data.table (0 rows and 12 cols): taxonKey,year,eea_cell_code,obs,pa_obs,cobs...
SanderDevisscher commented 2 months ago

The alienTaxa cube is being reworked (see https://github.com/inbo/aspbo/pull/202) to include species with observations after 1950 independent of their introduction date.