inbo / niche_vlaanderen

Python package to run the NICHE Vlaanderen model
https://inbo.github.io/niche_vlaanderen/
MIT License
5 stars 2 forks source link

Feature request: calibration module #262

Closed DriesAdriaens closed 1 year ago

DriesAdriaens commented 4 years ago

Niche model question

Every model needs to be calibrated. Calibration of the output of the NICHE Vlaanderen model is now done manually outside the Python workflow, making it a time consuming step prone to errors. We would appreciate if this task could be fully automated in Python. Calibration is done by overlaying the actual presence for each vegetation type according to the BWK-Habitatkaart with the predicted presence as reflected by the NICHE Vlaanderen raster output. A calibration score is then calculated as the percentage of overlap.

The actual presence can be offered as a precalculated polygon file covering the whole of Flanders with an attribute field for each vegetation type that gives the estimated area of presence within each polygon, as a percentage of the entire polygon (pHAB). This information is derived (outside the niche vlaanderen package) from the BWK-Habitatkaart (officially published version, updated every two years) using a conversion table from BWK-Habitatkaart to NICHE vegetation typology. However it would be good to allow custom shapefiles too since the official map is often updated for a specific area studied during the course of a project for which NICHE Vlaanderen is used. Attribute fields could follow a standard naming protocol though.

Example of attribute table: shape_id (...) pHAB1 pHAB2 pHAB... pHABxxx
22315 (...) 100 0 0 100
22316 (...) 40 60 0 0

PHABxxx: takes a value between 0-100 (percentage shape_area with vegetation xxx) xxx: integer corresponding to veg_code in niche_vegetation.csv (custom niche_vegetation.csv supported)

The NICHE Vlaanderen predictions used for calibration are the raster files with the binary output that results from either a full or a simple NICHE Vlaanderen model for which the input layers reflect the actual circumstances at best.

The calibration score per vegetation type is calculated as the ratio of the summed area of predictions in all polygons with presence, and the sum of the area that is expected based on pHAB and each polygon’s area. As polygons can be small, a dedicated (finer) resolution is to be used during calculation in order to avoid spatial mismatch during overlay of polygon and raster data. Otherwise calibration scores can be misleading. This internal resolution could be set explicitly (e.g. 1 m) or specified as a ratio of the model resolution (e.g. 20 if model resolution is 20 m).

Expected output would be a table with per vegetation type

  1. the area with predicted presence among polygons,
  2. the area without predicted presence,
  3. the calibration score calculated as the ratio between (1) en (1)+(2),
  4. the number of polygons the score is based on.

Ideally, (1), (2) and (3) are given for both the simple and full model approach at once if a full model is specified. Additionally, results for the individual polygons could be informative as well (long table). The possibility to rejoin the latter to the original polygons based on a common id would further aid interpretation (wide table).

calibration(polygonfile, model)

DriesAdriaens commented 2 years ago

Link to the document that describes the current workflow, spread over ArcMap, MsAccess and Excel.

johanvdw commented 1 year ago

Added in #295