guidocioni / point_wx

4 stars 0 forks source link

Change data selection method #100

Open guidocioni opened 5 months ago

guidocioni commented 5 months ago

When we have multiple columns with the same name, e.g. precipitation, precipitation_member1 for ensemble or precipitation, precipitation_icon for deterministic, we always use the logic data.loc[:, data.columns.str.contains('precipitation')] to select the relevant columns.

However this is prone to errors in case there are other columns that contain that same string. I think we should change this to something more robust like maybe a regex expression on the columns names?

guidocioni commented 3 weeks ago

For the ensemble now I'm using a more stringent regex selection method, as this contains method was causing some issues as shown in https://github.com/guidocioni/point_wx/issues/157.

The fix is in a56dab69764952c47b7a5b3c51bc121acdc8a9c5

For example if var='rain', then we can write

columns_regex = rf'{var}$|{var}_member(0[1-9]|[1-9][0-9])$'

and filter the columns from the dataframe as

df.loc[:, df.columns.str.match(columns_regex)]

This should not cause issues but is a little bit verbose... Would be cool to maybe pack it into a function?

Also we need to implement a similar logic for the other pages that use contains.