Closed raehik closed 11 months ago
I think the data step is ready, just needs some touching up before review. I'm adding some work on the training step here too, I'll move it out before review.
I can't seem to get the MLflow interface working nicely with the simplified CLI. By simplified, I mean --global_ {0,1}
, --co2 {0,1}
being replaced with --cyclize
, --co2-increase
. But that type of no-value option aren't supported by MLproject. I can't tell why, it seems like a very simple feature.
On testing, this produces forcing data ~x4 larger than currently. Not sure what sort of errors would result in that, but I can go through the changes again. Lines that touch gaussian_filter
and further up the call chain seem most likely.
Likely candidates:
eddy_forcing
was misused: both forcing_coarse
and the edited u_v_dataset
were returned as a tuple, but the function signature stated it returned a single dataset, and it was used as such. Maybe my simplifying changed behaviour here...?scipy.ndimage.gaussian_filter
was used weirdly, more erroneous type annotations. Probably fine, but needed some inspection.No, I misread some clauses, like this early return (debug_mode
is unused):
There were many small mistakes! I'm now getting identical outputs to main
for the same configuration. Need to clean up the history and rejig some code I re-messied.
Cleaned up history and logging/debugging setup, sorted all the to-dos I can (prior-existing ones that I'm unsure how to resolve are annotated and left). Ready for review.
yoooo it automatically merged? I had no idea that would happen. I rebased dev
onto data-step-refactor
locally and pushed, and that's been processed as a merge on GitHub!
There are some pain points with the current data step.
mlruns
. It uses 2 long random strings.mlflow run
CLI is clunkyargparse
incmip26.py
, partially with MLflow (viaMLproject
, which gets used bymlflow run
) i.e. some positional arguments are upgraded to (required) options in MLflowcmip26.py
This PR largely rewrites the data step. Unused code is removed. Stateful operations (globals) are moved into functions. The top-level script is now just a CLI and a handful of operations, mirroring how one would use it directly in Python.
BoundingBox
, CO2 increase handlingSome of the training step is touched too. Larger refactoring will be in another changeset.
Not done:
python
calls and explicit data locations instead of run IDs.To-dos:
new
.--co2-increase
flag work in MLprojectRelated work to do post-merge: