inbo / aspbo

The alien species portal backoffice contains automated data preparation scripts for the [alien species portal](https://github.com/inbo/alien-species-portal)
0 stars 0 forks source link

Municipality assignment for Vespa velutina nesten data #170

Open mvarewyck opened 4 months ago

mvarewyck commented 4 months ago

I think something goes wrong in the data processing for Vespa velutina. When loading the nesten data it seems to have the province stored where the municipality should be (column NAAM). I read the following columns from the data

Also suspicious is that the counts differ for NAAM and provincie

> nestenData = sf::st_read("~/Downloads/nesten.geojson")
Reading layer `nesten' from data source `/home/mvarewyck/Downloads/nesten.geojson' using driver `GeoJSON'
Simple feature collection with 8740 features and 36 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 2.567021 ymin: 50.18892 xmax: 6.365131 ymax: 51.46739
Geodetic CRS:  WGS 84
> table(nestenData$NAAM)

           Antwerpen           Henegouwen HoofdstedelijkGewest 
                 983                   52                   65 
             Limburg                 Luik            Luxemburg 
                 243                    6                    2 
               Namen      Oost-Vlaanderen       Vlaams-Brabant 
                   9                 3214                 1689 
       Waals-Brabant      West-Vlaanderen 
                  19                 2458 
> table(nestenData$provincie)

           Antwerpen           Henegouwen HoofdstedelijkGewest 
                 986                   47                   25 
             Limburg                 Luik            Luxemburg 
                 244                    3                    1 
               Namen             onbekend      Oost-Vlaanderen 
                   5                   33                 3245 
      Vlaams-Brabant        Waals-Brabant      West-Vlaanderen 
                1661                   15                 2475 

Although in the script I see some renaming of NAAM as gemeente

Discovered by alien-species-portal PR#74

SanderDevisscher commented 4 months ago

@soriadelva was allready working on this see #160. I expect it to be merged after PR of 160 - branch

soriadelva commented 3 months ago

The issue seems to be related to the iasset data. Some locations have the wrong coordinates and are manually altered in the script but only for column provincie (and not column NAAM), which explains the discrepancy (see:https://github.com/inbo/aspbo/blob/5c9da5892c3682fa64a99b2a8d00937394aa0794/src/Vespa%20velutina%20management/update_inputs.Rmd#L168-L174 ). A second problem that arises because of this, is that later in the script an intersect and left join with the iasset data and the communes.geojson is done to assign a commune to each coordinate. However, this does not exclude these wrong coordinates, thus assigning the wrong commune for these values (see:https://github.com/inbo/aspbo/blob/5c9da5892c3682fa64a99b2a8d00937394aa0794/src/Vespa%20velutina%20management/update_inputs.Rmd#L381-L386 ) @jrhillae knows about the issues related to these data and will have a look at it.

SanderDevisscher commented 3 months ago

Personally I think it is cleaner to switch to using gemeente as input to the processing scripts since it is more saying than NAME. A similar case related issue is GEWEST which should be changed to gewest to conform.

@mvarewyck which changes would this imply to the app side ? and the backoffice side ? @soriadelva what do you think ?

soriadelva commented 3 months ago

Personally I think it is cleaner to switch to using gemeente as input to the processing scripts since it is more saying than NAME. A similar case related issue is GEWEST which should be changed to gewest to conform.

@mvarewyck which changes would this imply to the app side ? and the backoffice side ? @soriadelva what do you think ?

Personally I think it is cleaner to switch to using gemeente as input to the processing scripts since it is more saying than NAME. A similar case related issue is GEWEST which should be changed to gewest to conform.

@mvarewyck which changes would this imply to the app side ? and the backoffice side ? @soriadelva what do you think ?

I agree that this will be a lot clearer. In case we apply this, I think it's best to immediately apply this to all other datasets too so there is no confusion.

jrhillae commented 3 months ago

@SanderDevisscher , @soriadelva : I am planning to clean up the coordinates in the iAsset file (wrong coordinates corrected based on the field 'adress' or empty (location unknown), within three weeks

mvarewyck commented 3 months ago

@mvarewyck which changes would this imply to the app side ? and the backoffice side ? @soriadelva what do you think ?

Renaming the columns would imply only a minor change in the code of the app. Just inform me when this is done.