m2lines / gz21_ocean_momentum

Stochastic-Deep Learning Parameterization of Ocean Momentum Forcing
MIT License
5 stars 1 forks source link

refactor data step script into library (API) and consumer (CLI) #85

Closed raehik closed 11 months ago

raehik commented 1 year ago

There are some pain points with the current data step.

This PR largely rewrites the data step. Unused code is removed. Stateful operations (globals) are moved into functions. The top-level script is now just a CLI and a handful of operations, mirroring how one would use it directly in Python.

Some of the training step is touched too. Larger refactoring will be in another changeset.

Not done:

To-dos:

Related work to do post-merge:

raehik commented 1 year ago

I think the data step is ready, just needs some touching up before review. I'm adding some work on the training step here too, I'll move it out before review.

raehik commented 1 year ago

I can't seem to get the MLflow interface working nicely with the simplified CLI. By simplified, I mean --global_ {0,1}, --co2 {0,1} being replaced with --cyclize, --co2-increase. But that type of no-value option aren't supported by MLproject. I can't tell why, it seems like a very simple feature.

raehik commented 1 year ago

On testing, this produces forcing data ~x4 larger than currently. Not sure what sort of errors would result in that, but I can go through the changes again. Lines that touch gaussian_filter and further up the call chain seem most likely.

raehik commented 12 months ago

Likely candidates:

raehik commented 12 months ago

No, I misread some clauses, like this early return (debug_mode is unused):

https://github.com/m2lines/gz21_ocean_momentum/blob/fff986c83e0b8288d84db5a302fe0ee8b30ee562/src/gz21_ocean_momentum/data/coarse.py#L192-L194

raehik commented 12 months ago

There were many small mistakes! I'm now getting identical outputs to main for the same configuration. Need to clean up the history and rejig some code I re-messied.

raehik commented 11 months ago

Cleaned up history and logging/debugging setup, sorted all the to-dos I can (prior-existing ones that I'm unsure how to resolve are annotated and left). Ready for review.

raehik commented 11 months ago

yoooo it automatically merged? I had no idea that would happen. I rebased dev onto data-step-refactor locally and pushed, and that's been processed as a merge on GitHub!