prio-data / viewsforecasting

Jupyter notebooks and python scripts for performing the ViEWS monthly forecasts
https://viewsforecasting.org
Other
13 stars 1 forks source link

Add surrogate models based on ESCWA needs #4

Open hhegre opened 2 years ago

hhegre commented 2 years ago

Add based on list of features provided by @angelicalmcgowan

angelicalmcgowan commented 2 years ago

Predictors ViEWS-ESCWA.xlsx

See "Selected predictors in the API" in the attached file for a first selection of predictors to add to the surrogate models. It lists the predictors we made available to ESCWA in the API following earlier requests.

Once the ESCWA models are added to the ensemble at both cm and pgm, perhaps we can look over this again and switch to/add the top-3 features in the feature importances tables for each sub-model, alt. the top X number of features in the ensemble (whichever makes more sense)? That is usually what we discuss in various reports and would be great to have.

hhegre commented 2 years ago

I can't read the excel file.... Can you just paste the list here? Note that we can't make a surrogate model for every indicator in the API, we should add not more than 3-4 of them. I have already added "general efficiency" from the aqua stat dataset.

hhegre commented 2 years ago

We could select features based on feature importances, but I think it might be equally useful to select on a variable's interpretability. Also, we should select features with good data quality and coverage. So the food price indicators are not so useful at this stage...

Which constituent models did we use to have in the ESCWA API?

angelicalmcgowan commented 2 years ago

ESCWA variables in the API:

Codebook can be found here: https://api.viewsforecasting.org/escwa_data_2021_10_01/codebook

ESCWA models at cm:

cm_sb_cflong_global_calibrated

cm_sb_escwa_onset_global_calibrated

cm_sb_vdem_global_calibrated

cm_sb_wdi_global_calibrated

cm_sb_escwa_global_calibrated

cm_sb_aquastat_global_calibrated

cm_sb_food_global_calibrated

cm_sb_faostat_global_calibrated

cm_sb_imfweo_global_calibrated

ESCWA models at pgm:

pgm_sb_sptime_ame_calibrated

pgm_sb_naturalsocial_ame_calibrated

pgm_sb_crop_drought_ame_calibrated

pgm_sb_vulnerability_ame

pgm_sb_crop_drought_vulnerability_ame

pgm_sb_combined_ame_calibrated

pgm_sb_crosslevel_ame_calibrated

You can also find the list of models in the API here: https://api.viewsforecasting.org/escwa_2021_12_01

Codebook: https://api.viewsforecasting.org/escwa_2021_12_01/codebook


From: hhegre @.***> Sent: Friday, July 1, 2022 13:16 To: prio-data/viewsforecasting Cc: Angelica Lindqvist-McGowan; Mention Subject: Re: [prio-data/viewsforecasting] Add surrogate models based on ESCWA needs (Issue #4)

I can't read the excel file.... Can you just paste the list here? Note that we can't make a surrogate model for every indicator in the API, we should add not more than 3-4 of them. I have already added "general efficiency" from the aqua stat dataset.

— Reply to this email directly, view it on GitHubhttps://github.com/prio-data/viewsforecasting/issues/4#issuecomment-1172236542, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AP5QICRRBVBFV3BOZ2RXBK3VR3HSLANCNFSM5Z3UNEWA. You are receiving this because you were mentioned.Message ID: @.***>

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

angelicalmcgowan commented 2 years ago

Ok. Would it be possible to create surrogate models based on both feature importances (in time, when ready) and interpretability so we have the full pool available if needed for in-depth reports, but then only upload the best 3-4 to the API? Or is that perhaps too much wishful thinking?


From: hhegre @.***> Sent: Friday, July 1, 2022 13:19 To: prio-data/viewsforecasting Cc: Angelica Lindqvist-McGowan; Mention Subject: Re: [prio-data/viewsforecasting] Add surrogate models based on ESCWA needs (Issue #4)

We could select features based on feature importances, but I think it might be equally useful to select on a variable's interpretability. Also, we should select features with good data quality and coverage. So the food price indicators are not so useful at this stage...

Which constituent models did we use to have in the ESCWA API?

— Reply to this email directly, view it on GitHubhttps://github.com/prio-data/viewsforecasting/issues/4#issuecomment-1172238219, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AP5QICVG6BCOYRRDPEONUELVR3H2PANCNFSM5Z3UNEWA. You are receiving this because you were mentioned.Message ID: @.***>

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

hhegre commented 2 years ago

It is not a problem to have increase the number of surrogate models - we can also have more of them than we have in the API in principle, but maybe I think it is better to have the same set everywhere.

The most important constraint is that we have to get the update schedule in place. Most of the ESCWA features are not updated in our routines. Some of them, like aquastat, are not updated at the source either. We should do a careful review of this after Summer and revise our routines. For now, I will include the water services efficiency and rule of law in the surrogate models.

angelicalmcgowan commented 2 years ago

Ok sounds good. It would be great to include update schedule from both data provider and us in the codebook as well as it's being written (for our own documentation as well as for transparency externally).


From: hhegre @.***> Sent: Saturday, July 2, 2022 09:13 To: prio-data/viewsforecasting Cc: Angelica Lindqvist-McGowan; Mention Subject: Re: [prio-data/viewsforecasting] Add surrogate models based on ESCWA needs (Issue #4)

It is not a problem to have increase the number of surrogate models - we can also have more of them than we have in the API in principle, but maybe I think it is better to have the same set everywhere.

The most important constraint is that we have to get the update schedule in place. Most of the ESCWA features are not updated in our routines. Some of them, like aquastat, are not updated at the source either. We should do a careful review of this after Summer and revise our routines. For now, I will include the water services efficiency and rule of law in the surrogate models.

— Reply to this email directly, view it on GitHubhttps://github.com/prio-data/viewsforecasting/issues/4#issuecomment-1172851014, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AP5QICV2CRTMIDXWMFOYRJLVR7TZPANCNFSM5Z3UNEWA. You are receiving this because you were mentioned.Message ID: @.***>

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy