mapme-initiative / mapme.pipelines

R package to run mapme pipelines configured in YAML
Other
0 stars 0 forks source link

Move indicator parameters to yaml #5

Closed karpfen closed 4 months ago

karpfen commented 4 months ago

Currently, most indicator parameters (e.g. min_size in src/gfw.R) are hard coded in the individual scripts in src. I propose to extend the config.yml file to include these parameters. I'm thinking of something that looks like this in the end:

default:
  input: ./data/WDPA_WDOECM_Jun2024_Public.parquet
  ncores: 12
  data_path: ./data
  output_path: ./output
  progress: true
  by_region: false
  timeout: 600
  overwrite: false
  wdpa_ver: "latest"
  params:
    resources:
      get_key_biodiversity_areas:
        path: "./kbas.gpkg"
      get_gfw_treecover:
        version: "GFC-2023-v1.11"
    indicators:
      calc_carbon:
        stats: c("min", "mean", "max")
    calc_treecover_area:
      years: 2000:2010
      min_size: 20
      min_cover: 50

Does that make sense to you, @goergen95?

karpfen commented 4 months ago

@goergen95 I noticed one potential issue when I played around with this approach yesterday. Ideally, everything that's set under params should be validated against the package imho.

This means in the case above:

However, available_resources() does not give us the actual function name (e.g. get_gfw_treecover), but instead the "alias" gfw_treecover. We need the alias to check against available_resources()$name, but for the actual call we then need the real function name; so we would need to get that mapping somewhere. I don't think that's possible right now, no?

goergen95 commented 4 months ago

We will require validation, but I see this happening in two stages:

In summary, I do not see the need to go over available_resources() for this to work.

goergen95 commented 4 months ago

Something like this should get the job done:

f <- getFromNamespace("get_key_biodiversity_areas", "mapme.biodiversity")
f(path = "./kbas.gpkg")
goergen95 commented 4 months ago

Taking first steps here. Validating the configuration via JSON Schema looks fine. Next steps are to figure out how to represent resources/indicators as objects in JSON Schema.

Update: That seems to work now, too. Next up is some R logic to construct proper function calls from the validated yaml and start the processing.

goergen95 commented 4 months ago

I think the schema branch is ready for testing now. :crossed_fingers: