matthiasdemuzere / w2w

A python tool that ingests WUDAPT information into WRF.
MIT License
39 stars 16 forks source link

[custom UCP] LCZ parameter error #92

Closed FloBre33 closed 1 year ago

FloBre33 commented 2 years ago

Hi,

I'm trying to run w2w with a custom UCP table but run into the following issue.

w2w -n 25 ./ lcz_wudapt_ponche.tif geo_em.d04.nc --lcz-ucp custom_lcz_ucp.csv
--> Set data, arguments and files
> Seems you are using a LCZ map produced by https://lcz-generator.rub.de/
> I therefor use the Gaussian filtered default 'LczFilter' layer
--> Check LCZ integrity, in terms of class labels, projection and extent
> LCZ labels as expected (1 to 17)
ERROR 1: PROJ: proj_identify: Cannot find proj.db
ERROR 1: PROJ: proj_identify: Cannot find proj.db
> LCZ provided as WGS84 (EPSG:4326)
> LCZ domain is covering WRF domain
ERROR 1: PROJ: proj_create_from_name: Cannot find proj.db
--> Replace WRF MODIS urban LC with surrounding natural LC
Looping through urban grid pixels: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 197/197 [00:01<00:00, 136.76it/s]
--> Create LCZ-based geo_em file
Traceback (most recent call last):
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'MH_URB2D'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home2020/home/geo/fbreton/w2w_netcdf3/bin/w2w", line 8, in <module>
    sys.exit(main())
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/w2w/w2w.py", line 179, in main
    nbui_max = create_lcz_params_file(
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/w2w/w2w.py", line 1226, in create_lcz_params_file
    dst_data = _add_frc_lu_index_2_wrf(
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/w2w/w2w.py", line 1152, in _add_frc_lu_index_2_wrf
    frc_urb = _ucp_resampler(
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/w2w/w2w.py", line 678, in _ucp_resampler
    SW, BW = _get_SW_BW(ucp_table)
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/w2w/w2w.py", line 630, in _get_SW_BW
    SW = ucp_table['MH_URB2D'] / ucp_table['H2W']
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/pandas/core/frame.py", line 3505, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
    raise KeyError(key) from err
KeyError: 'MH_URB2D'

The custom UCP table is below.

   ,FRC_URB2D ,MH_URB2D_MIN ,MH_URB2D ,MH_URB2D_MAX ,BLDFR_URB2D ,H2W
1  ,0.95      ,40           ,50       ,60           ,0.5         ,2.25
2  ,0.9       ,15           ,17.5     ,20           ,0.55        ,0.8
3  ,0.85      ,5            ,6.4      ,10           ,0.55        ,0.67
4  ,0.65      ,40           ,50       ,60           ,0.3         ,1.17
5  ,0.7       ,15           ,17.5     ,20           ,0.3         ,0.67
6  ,0.6       ,5            ,6.4      ,10           ,0.3         ,0.49
7  ,0.85      ,5            ,5        ,5            ,0.75        ,0.2
8  ,0.85      ,5            ,6.4      ,10           ,0.4         ,0.22
9  ,0.3       ,5            ,6.4      ,10           ,0.15        ,0.148
10 ,0.55      ,5            ,10       ,15           ,0.25        ,0.42
11 ,0         ,0            ,0        ,0            ,0           ,0
12 ,0         ,0            ,0        ,0            ,0           ,0
13 ,0         ,0            ,0        ,0            ,0           ,0
14 ,0         ,0            ,0        ,0            ,0           ,0
15 ,1         ,5            ,5        ,5            ,0.05        ,1
16 ,0         ,0            ,0        ,0            ,0           ,0
17 ,0         ,0            ,0        ,0            ,0           ,0

Is this some kind of syntax error? I just copied the default UCP table and changed the values.

Florentin

jkittner commented 2 years ago

hey, please use the issue template next time, it's there for a reason :)

This is a bug on our end indeed. We don't handle the non standard, padded-out csv format properly. If you remove the spaces e.g. like this, it should work fine.

,FRC_URB2D,MH_URB2D_MIN,MH_URB2D,MH_URB2D_MAX,BLDFR_URB2D,H2W
1,0.95,40,50,60,0.5,2.25
2,0.9,15,17.5,20,0.55,0.8
3,0.85,5,6.4,10,0.55,0.67
4,0.65,40,50,60,0.3,1.17
5,0.7,15,17.5,20,0.3,0.67
6,0.6,5,6.4,10,0.3,0.49
7,0.85,5,5,5,0.75,0.2
8,0.85,5,6.4,10,0.4,0.22
9,0.3,5,6.4,10,0.15,0.148
10,0.55,5,10,15,0.25,0.42
11,0,0,0,0,0,0
12,0,0,0,0,0,0
13,0,0,0,0,0,0
14,0,0,0,0,0,0
15,1,5,5,5,0.05,1
16,0,0,0,0,0,0
17,0,0,0,0,0,0

If you want to have a look, it's this line that's initially causing it and needs fixing. https://github.com/matthiasdemuzere/w2w/blob/a90c686fb4bb8aa24e3b06c384b0761a0409e8f5/w2w/w2w.py#L156 When you look at ucp_table.columns, there's most likely trailing whitespace somewhere causing the column names to be incorrect. We can probably use this to fix it: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.strip.html

FloBre33 commented 2 years ago

Thank you for the superfast response! :) (and sorry for the template, guilty)

I removed the spaces and reran the code.

(w2w_netcdf3) (base) [fbreton@hpc-login1 w2w_ncdf3_wudapt]$ w2w -n 25 ./ lcz_wudapt_ponche.tif geo_em.d04.nc --lcz-ucp custom_lcz_ucp.csv
--> Set data, arguments and files
> Seems you are using a LCZ map produced by https://lcz-generator.rub.de/
> I therefor use the Gaussian filtered default 'LczFilter' layer
--> Check LCZ integrity, in terms of class labels, projection and extent
> LCZ labels as expected (1 to 17)
ERROR 1: PROJ: proj_identify: Cannot find proj.db
ERROR 1: PROJ: proj_identify: Cannot find proj.db
> LCZ provided as WGS84 (EPSG:4326)
> LCZ domain is covering WRF domain
ERROR 1: PROJ: proj_create_from_name: Cannot find proj.db
--> Replace WRF MODIS urban LC with surrounding natural LC
Looping through urban grid pixels: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 197/197 [00:01<00:00, 150.44it/s]
--> Create LCZ-based geo_em file
> Processing LP_URB2D ...
> Processing MH_URB2D ...
> Processing STDH_URB2D ...
> Processing HGT_URB2D ...
> Processing LB_URB2D ...
> Processing LF_URB2D ...
> Processing HI_URB2D ...
Traceback (most recent call last):
  File "/home2020/home/geo/fbreton/w2w_netcdf3/bin/w2w", line 8, in <module>
    sys.exit(main())
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/w2w/w2w.py", line 179, in main
    nbui_max = create_lcz_params_file(
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/w2w/w2w.py", line 1272, in create_lcz_params_file
    ucp_res, nbui_max = _hi_resampler(
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/w2w/w2w.py", line 971, in _hi_resampler
    df_hi = _compute_hi_distribution(info, ucp_table=ucp_table)
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/w2w/w2w.py", line 916, in _compute_hi_distribution
    hi_sample = _get_truncated_normal_sample(
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/w2w/w2w.py", line 833, in _get_truncated_normal_sample
    hi_sample = hi_inst.rvs(SAMPLE_SIZE)
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/scipy/stats/_distn_infrastructure.py", line 467, in rvs
    return self.dist.rvs(*self.args, **kwds)
  File "/home2020/home/geo/fbreton/w2w_netcdf3/lib/python3.8/site-packages/scipy/stats/_distn_infrastructure.py", line 1066, in rvs
    raise ValueError(message)
ValueError: Domain error in arguments. The `scale` parameter must be positive for all distributions, and many distributions have restrictions on shape parameters. Please see the `scipy.stats.truncnorm` documentation for details.

The new issue seems to be with statistical fitting that is unhappy with the input parameters. The input parameters could be out of bounds of what's expected by the statistical model, or the model expects varying parameter values (e.g. my LCZ 15 with constant building height broke the computation).

It seems related to this part of the JOSS paper.

HI_URB2D is obtained by fitting a bounded normal distribution to the minimum (MH_URB2D_MIN), mean (MH_URB2D), and maximum (MH_URB2D_MAX) building height, as provided in LCZ_UCP_default.csv. The building height standard deviation is also required and is approximated as (MH_URB2D_MAX - MH_URB2D_MIN) / 4.

I would assume from the last line that the model cannot work with constant building height per LCZ class. So to make it work, I just changed a little the min and max of building heights (LCZs 7 and 15) for the model to be happy with the standard deviation. :)

,FRC_URB2D,MH_URB2D_MIN,MH_URB2D,MH_URB2D_MAX,BLDFR_URB2D,H2W
1,0.95,40,50,60,0.5,2.25
2,0.9,15,17.5,20,0.55,0.8
3,0.85,5,6.4,10,0.55,0.67
4,0.65,40,50,60,0.3,1.17
5,0.7,15,17.5,20,0.3,0.67
6,0.6,5,6.4,10,0.3,0.49
7,0.85,4.9,5,5.1,0.75,0.2
8,0.85,5,6.4,10,0.4,0.22
9,0.3,5,6.4,10,0.15,0.148
10,0.55,5,10,15,0.25,0.42
11,0,0,0,0,0,0
12,0,0,0,0,0,0
13,0,0,0,0,0,0
14,0,0,0,0,0,0
15,1,4.9,5,5.1,0.05,1
16,0,0,0,0,0,0
17,0,0,0,0,0,0

This gives OK results.

(w2w_netcdf3) (base) [fbreton@hpc-login1 w2w_ncdf3_wudapt]$ w2w -n 25 ./ lcz_wudapt_ponche.tif geo_em.d04.nc --lcz-ucp custom_lcz_ucp.csv
--> Set data, arguments and files
> Seems you are using a LCZ map produced by https://lcz-generator.rub.de/
> I therefor use the Gaussian filtered default 'LczFilter' layer
--> Check LCZ integrity, in terms of class labels, projection and extent
> LCZ labels as expected (1 to 17)
ERROR 1: PROJ: proj_identify: Cannot find proj.db
ERROR 1: PROJ: proj_identify: Cannot find proj.db
> LCZ provided as WGS84 (EPSG:4326)
> LCZ domain is covering WRF domain
ERROR 1: PROJ: proj_create_from_name: Cannot find proj.db
--> Replace WRF MODIS urban LC with surrounding natural LC
Looping through urban grid pixels: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 197/197 [00:01<00:00, 151.38it/s]
--> Create LCZ-based geo_em file
> Processing LP_URB2D ...
> Processing MH_URB2D ...
> Processing STDH_URB2D ...
> Processing HGT_URB2D ...
> Processing LB_URB2D ...
> Processing LF_URB2D ...
> Processing HI_URB2D ...
--> Create LCZ-based urban extent geo_em file (excluding other LCZ-based info)
--> Expanding land categories of parent domain(s) to 41

--> Start sanity check and clean-up ...
> Check 1: Urban class removed from geo_em.d04_NoUrban.nc? OK
> Check 2: LCZ Urban extent present in geo_em.d04_LCZ_extent.nc? OK
> Check 3: Urban LCZ classes exists in geo_em.d04_LCZ_params.nc? OK: LCZ Classes ([2, 4, 5, 6, 8]) present
> Check 4: FRC_URB2D present in geo_em.d04_LCZ_params.nc? OK: FRC_URB2D values range between 0.00 and 0.86
> Check 5: URB_PARAMS matrix present in file geo_em.d04_LCZ_params.nc? OK
> Check 6: Do URB_PARAM variable values follow expected range in geo_em.d04_LCZ_params.nc?
   + OK for LP_URB2D
   + OK for MH_URB2D
   + OK for STDH_URB2D
   + OK for HGT_URB2D
   + OK for LB_URB2D
   + OK for LF_URB2D
> Check 7: Does HI_URB2D sum to 100% for urban pixels in geo_em.d04_LCZ_params.nc? OK
> Check 8: Do FRC_URB and LCZs (from LU_INDEX) cover same extent in geo_em.d04_LCZ_params.nc? OK
> Check 9: Extent and # urban pixels same for *_extent.nc and *_params.nc output file? OK, urban extent the same (196)
> Cleaning up ... all done!

 ----------- !! NOTE !! ---------
 Set nbui_max to 6 during compilation, in order to optimize memory storage.

Cheers, Flo

matthiasdemuzere commented 2 years ago

Hi @FloBre33,

Thanks for reporting about this.

To be honest, this user-specific template functionality has not been tested extensively before. E.g., I haven't really tested the influence of changing the parameter values downstream, such as in the building height distribution derivation ...

But your assessment is correct: scipy.stats.truncnorm - that is used to estimate the building height probabilities per height bin - builds a normal distribution around a min, mean, max, yet also requires a standard deviation. If all heights are the same, no distribution can be developed, and the system will crash.

I'll have to check how to make this clear. Either in the documentation, or as a warning in case someone uses a user-specific look-up table!

jkittner commented 1 year ago

via #95