informatics-lab / precip_rediagnosis

Project to use ML to re-diagnose precipitation fields from ensemble model fields
0 stars 0 forks source link

Data prep - Storm dennis #2

Closed stevehadd closed 2 years ago

stevehadd commented 2 years ago

Here is my code for doing the initial data wrangling for our initial storm dennis notebook.

Key files

This PR is also for any comments on the actual data produced as well. Please see the OneNote page for info on where to find the data.

Future work

Close #1

review-notebook-app[bot] commented 2 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

hannahbrown7 commented 2 years ago

I agree with the comment in the documentation about future work to make extract_mass.py and extract_mass_radar.py more general as both have some hard coded inputs including file paths and parameters etc. But otherwise both of these scripts look fine.

stevehadd commented 2 years ago

Based on discussion, created an issue to update metadata (#15 )

hannahbrown7 commented 2 years ago

Easily fixed once data is loaded in, but thought worth noting down somewhere that radar accumulations is in mm and model data in m. Might be worth aligning units or including units in column headers

hannahbrown7 commented 2 years ago

The forecast_reference_time and forecast_period columns in the radar dataset and model data do not align so get duplicated columns when merging together. May be worth removing from the radar data before merging, as forecast_reference_time is the same as period_midpoint and forecast_period is all zeros.

stevehadd commented 2 years ago

I agree with the comment in the documentation about future work to make extract_mass.py and extract_mass_radar.py more general as both have some hard coded inputs including file paths and parameters etc. But otherwise both of these scripts look fine.

Just to note I have opened issue #3 to make the ata prep scripts more general purpose. Once we've got an initial pipeline set up we should start considering additional data, so refactoring will be important for processing additional data.