openclimatefix / ocf_datapipes

OCF's DataPipe based dataloader for training and inference
MIT License
13 stars 11 forks source link

GSP polygons are wrong #187

Closed dfulu closed 1 year ago

dfulu commented 1 year ago

Describe the bug

The get_gsp_id_to_shape(get_gsp_id_to_shape, sheffield_solar_region_path) function produces a table which has the wrong polygons associated with each GSP ID. This has a fairly big effect on the GSP data pipeline since we use the polygons to select the location for satellite and NWP cropped images.

This happens when the function is used with our default arguments:

gsp_shapes = get_gsp_id_to_shape(
    get_gsp_id_to_shape=get_gsp_metadata_from_eso(), 
    sheffield_solar_region_path=get_gsp_shape_from_eso(),
)

This is ultimately caused by the dataframe loaded from the sheffield_solar_region_path containing multiple entries for each GSP name (GSPs). The duplicate entries from the table are shown below.

                       GSPs GSPGroup  RegionID       geometry
index                                                        
4      ACTL_C|WISD_1|WISD_6       _A         5  MULTIPOLYG...
5      ACTL_C|WISD_1|WISD_6       _C         6  MULTIPOLYG...
13                    AXMI1       _H        14  POLYGON ((...
14                    AXMI1       _L        15  MULTIPOLYG...
18            BARKC1|BARKW3       _A        19  MULTIPOLYG...
19            BARKC1|BARKW3       _C        20  MULTIPOLYG...
25                   BESW_1       _E        26  MULTIPOLYG...
26                   BESW_1       _B        27  MULTIPOLYG...
29                   BISW_1       _K        30  MULTIPOLYG...
30                   BISW_1       _E        31  MULTIPOLYG...
44                   BRIM_1       _A        45  MULTIPOLYG...
45                   BRIM_1       _C        46  POLYGON ((...
108                  ECLA_1       _B       109  MULTIPOLYG...
109                  ECLA_1       _E       110  POLYGON ((...
123                  FECK_6       _B       124  MULTIPOLYG...
124                  FECK_6       _E       125  MULTIPOLYG...
142                  GREN_1       _A       143  MULTIPOLYG...
143                  GREN_1       _B       144  POLYGON ((...
150                  HAMHC1       _E       151  MULTIPOLYG...
151                  HAMHC1       _B       152  MULTIPOLYG...
164                   IROA1       _E       165  POLYGON ((...
165                   IROA1       _L       166  MULTIPOLYG...
209                  MELK_1       _H       210  MULTIPOLYG...
210                  MELK_1       _L       211  POLYGON ((...
251                  RAIN_1       _G       252  POLYGON ((...
252                  RAIN_1       _D       253  POLYGON ((...
254                  RASS_1       _K       255  POLYGON ((...
255                  RASS_1       _E       256  MULTIPOLYG...
290                    STHA       _P       291  POLYGON ((...
291                    STHA       _N       292  MULTIPOLYG...

The function includes logic to combine rows where the RegionID is the same. However, each row has a different region ID even when the GSPs column is the same. This may be a leftover from when the GSPs changed, perhaps it worked then.

This should be easily solved by combining the rows with the same GSPs entry.

dfulu commented 1 year ago

@JackKelly @peterdudfield @dantravers

JackKelly commented 1 year ago

Great work for spotting this bug, @dfulu!