pySTEPS / pysteps

Python framework for short-term ensemble prediction systems.
https://pysteps.github.io/
BSD 3-Clause "New" or "Revised" License
452 stars 165 forks source link

Refactor the blending/steps.py nowcasting code #395

Open ladc opened 1 month ago

ladc commented 1 month ago

Start with low-hanging fruit:

dnerini commented 1 month ago

Yes please!

sidekock commented 1 month ago

@RubenImhoff I went trough the whole code to see what was unclear for me. I will already paste my findings here and maybe we can discuss later today. Ill first time some time to eat :p

Specific lines of needed name changes

Line 583: R_f should get a new name

Line 600: Rf empty_precip?

Line 616: outarr

Line 677: pp, generate_noise, noise_std_coeffs = _init_noise(

Line 716: randgen_prec, vps, generate_vel_noise = _init_random_generators(

Line 727: D, D_Yn, D_pb, R_f, R_m, mask_rim, struct, fft_objs = _prepare_forecast_loop(

Line 784: forecast_prev = precip_cascade noise_prev = noise_cascade t_prev = [0.0 for j in range(n_ens_members)] t_total = [0.0 for j in range(n_ens_members)]

Line 790: This timestep function is extremly complicated to understand, maybe changing some names would make this more clear

Line 858 rho_extr would change to rho_extrap (just a bit more clear what this is)

Line 865 Rf should be renamed: R_f_temp

Line 922: cov -> covariance_radar_model?

Line 955: EPS should be changed, quite unclear what it is

Line 997: EPS_ should be changed

Line 1028: R_f_ip

Line 1033: Yn_ip

Line 1051: t_diff_prev

Line 1065: V_stack

Line 1073: Vmodel

Line 1094: R_f_im_recomp

Line 1111: R_f_eprecomp

Line 1118: R_f_ep_recomp

Line 1119: temp_mask

Line 1122: R_f_ep

Line 1144: Yn_ip_recomp

Line 1158: Yn_ep_recomp

Line 1159: Yn_ep

Line 1173: R_f_ep_out, Yn_ep_out, R_pm_ep

Line 1285: R_f_out

Line 1339: cascadesstacked

Line 1356: R_f_blended

Line 1361: R_f_blended_mod_only

Line 1384: R_f_new

Line 1391: R_f_new_mod_only

Line 1412: weights_pm, weights_pm_normalized, weights_pm_mod_only,

Line 1438 Better commenting?

Line 1439: R_pm_blended

Line 1447: R_pm_blended_mod_only

Line 1490: Can be removed

line 1517: R_cmin

Line 1525: MASK_prec -> MASK_precip or MASK_precip_cell?

Line 1550: mu_0, MASK, mu_fct

Line 1578: R_f_stacked

General: All fc should be changed to forecast?

We should look at all these t_... variables, they are quite strange

Do we go for mu and sigma or means and sigmas? Should be uniform choice

R_f = precip_forecast? is this correct?

sidekock commented 1 month ago

For all of these, I could not find the names in the nowcast steps but it is really hard to see the variables through the forest :) . I also tried to do this matching by using some generative AI tools but it does not preform well on this sort of tasks

RubenImhoff commented 1 month ago

To give a brief answer (already):

General: All fc should be changed to forecast? --> Yes!

We should look at all these t_... variables, they are quite strange --> Fair, if doable :)

Do we go for mu and sigma or means and sigmas? Should be uniform choice --> I would go for means and sigmas then :)

R_f = precip_forecast? is this correct? --> Correct

dnerini commented 1 month ago

short comment: when there are so many variables given and returned by functions, it's usually a good indication that you need a class.

so why not refactoring the code into a class and so that the methods can simply access attributes instead of having to pass them around all the time? the main interface can stay the same, that is, forecast()

sidekock commented 1 month ago

I agree this is the most elegant solution but maybe we should tackle that problem after name changes and changing the monolith into something more readable?

sidekock commented 1 month ago

My work today so the refactoring can be done in the future; things like splitting up the code into more modular functions or making a class can be added after the current steps I list below:

STEP 1: All "R_f" should be changed to "precip_forecast". Use CTRL+F to find all relevant changes in both code and comments STEP 2: Specific names need to be changed; due to step 1, some of them have been changed from the original. Below I list all changed needed and some proposed names (can definitely be changed). The check with CTRL F is added because refactoring in one place does not always guarantee the name is changed in every function.

Line number Old name Name after step 1 Proposed name Check with Ctrl F
600 Rf precipforecast precip_forecast No
616 outarr precip_forecast_all_members_all_times No
677 pp generate_perturb Yes
716 randgen_prec randgen_precip Yes
716 vps velocity_perturbations Yes
727, 2321 R_m precip_forecast_non_perturbed
727, 2303 D previous_displacement No
727, 2304 D_Yn previous_displacement_noise_cascade No
727, 2305 D_pb previous_displacement_prob_matching No
761, 859 rho_extr_prev rho_extrap_cascade_prev No
762, 882, 892 rho_extr rho_extrap_cascade No
784 forecast_prev precip_forc_prev_subtimestep No
785 noise_prev noise_prev_subtimestep No
786 t_prev t_prev_timestep No
787 t_total t_leadtime_since_start_forecast No
922 cov covariance_nwp_models No
955, 2387 EPS epsilon_decomposed No
955, 964, 2387, 2391 epsilon_decomposed epsilon_ No
977, 2403 EPS_ epsilon_temp No
1026 t_diff_prev_int t_diff_prev_subtimestep_int No
1028 R_f_ip precip_forecast_ip precip_forecast_cascade_subtimestep No
1033 Yn_ip noise_cascade_subtimestep No
1051 t_diff_prev t_diff_prev_subtimestep No
1059 velocity_pert velocity_perturbations_extrapolation No
1065 V_stack velocity_stack_all No
1073 Vmodel velocity_models No
1094 R_f_ip_recomp precip_forecast_ip_recomp precip_forecast_recomp_subtimestep No
1111 R_f_eprecomp precip_forecast_eprecomp precip_forecast_extrapolated_recomp_subtimestep_temp No
1118 R_f_ep_recomp precip_forecast_ep_recomp precip_forecast_extrapolated_recomp_subtimestep No
1119 temp_mask ???
1122 R_f_ep precip_forecast_ep precip_forecast_extrapolated_decomp No
1144 Yn_ip_recomp noise_cascade_recomp No
1151 Yn_eprecomp noise_extrapolated_recomp_temp No
1158 Yn_ep_recomp noise_extrapolated_recomp No
1159 Yn_ep noise_extrapolated_decomp No
1173 R_f_ep_out precip_forecast_ep_out precip_forecast_extrapolated_decomp_done No
1147 Yn_ep_out noise_extrapolated_decomp_done No
1192 R_ precip_forecast No
1194 R_pm_ep precip_forecast_extrapolated_probability_matching_temp No
1201 R_pmep precip_forecast_extrapolated_probability_matching No
1285 R_f_out precip_forecast_out final_blended_forecast No
1294 cascades_stacked cascade_stack_all_components No
1339 cascadesstacked cascade_stack_all_components_temp No
1347 covariance_nwp_models covariance_all_components No
1356 R_f_blended precip_forecast_blended precip_forecast_blended No
1361 R_f_blended_mod_only precip_forecast_blended_mod_only precip_forecast_blended_mod_only No
1384 R_f_new precip_forecast_new precip_forecast_recomposed No
1391 R_f_new_mod_only precip_forecast_new_mod_only precip_forecast_recomposed_mod_only No
1412 weights_pm weights_probability_matching No
1413 weights_pm_normalized weights_probability_matching_normalized No
1415 weights_pm_mod_only weights_probability_matching_mod_only No
1438 weights_pm_normalized_mod_only weights_probability_matching_normalized_mod_only No
1439 R_pm_blended precip_forecast_probability_matching_blended No
1447 R_pm_blended_mod_only precip_forecast_probability_matching_blended_mod_only No
1517 R_cmin precip_forecast_min_value No
1525 MASK_prec precip_field_mask No
1550 Mu_0 mean_probabiltity_matching_forecast No
1551 MASK no_rain_mask No
1552 mu_fct mean_precip_forecast No
1578 R_f_stacked precip_forecast_stacked precip_forecast_final No
1955 R_min precip_forecast_min No
1998 R_d precip_forecast_decomp No
2000 R_ precip_forecast No
2014 R_c precip_forecast_cascades No
2098 R_c precip_forecast_cascades No
2165 R_models_pm precip_forecast_probability_matching No
2291 R_c precip_forecast_cascades No
2321 R_m ???? ???
2333 R_c precip_forecast_cascades No

STEP 3: Splitting up the code into separate functions (I might add some suggestions here tomorrow)

STEP 4: Changing to a class will make keeping track of all these variables less necessary.

Please provide me with some feedback.

sidekock commented 1 month ago

I just went trough the code to try and refactor it. A version This is just a proof of concept is now pushed to try and show my suggestions (however with the old names as I did not push my version with name changes yesterday...) Below I will post the suggestions I applied to the code to have a clear visual what to do once the master branch can be refactored.

sidekock commented 1 month ago

Refactoring Plan

Part in Code Suggested Name Extra Info/Recommendations
Everything from start till just before #1 preprocessing Check if all variables are still found in function: zerovalue
Everything from #2 to just before #2.3 compute_cascades Check if all variables are still found for the function: mu_extrapolation, sigma_extrapolation
After 2.3.1 in if zero_precip_radar and zero_model_fields: no_rain_radar_and_nwp
Bring #3 initialisze_noise_methods
Bring #6 into one function except for last line initiate_all_random_generators Check if all variables are still found: randgen_prec, vps, generate_vel_noise, mu_noise, sigma_noise, D, D_Yn, D_pb, R_f, R_m, mask_rim, struct, fft_objs
Everything in worker should be first extracted to a worker function - Not a nested definition but a separate function
Then worker should be split up further -
Bring 8.1.2 determine_skill_nwp_at_leadtime
Bring 8.2 calculate_weights_per_component
Bring 8.3.1 calculcate_epsilon_decomposed
8.3.2 iterate_ar_extrapolation_per_cascade
8.3.3 iterate_ar_noise_per_cascade
For loop in 8.4 extrapolate_per_timestep
Following for loop just before 8.5 advect_forecast_one_timestep_no_subtimesteps
In 8.5 if blend_nwp_members: else ... concatinate_cascade_mean_and_sigmas
if weights_method == "spn" weights_method_spn
Before 8.6 blend_all_cadcades
8.6 recompose_cascades_to_precip_field
8.7.1 calculate_probability_matching_member
8.7.1 calculate_forecast_probabilistic_member
8.7.2 apply_probability_matching