m2lines / gz21_ocean_momentum

Stochastic-Deep Learning Parameterization of Ocean Momentum Forcing
MIT License
5 stars 1 forks source link

Allow specifying forcing data for training step without going through MLflow #78

Closed raehik closed 1 year ago

raehik commented 1 year ago

The training step trainScript.py works by loading forcing data generated from a data processing run, and training a fresh neural net on it. This forcing data is located using MLflow. You provide a "run ID", and MLflow connects that to an existing "run" which sits in the mlruns directory. (This directory is probably local to the MLproject file, or the directory that you run the mlflow command from, but I'm not certain.)

https://github.com/m2lines/gz21_ocean_momentum/blob/6958f5e6b4fbd1a86e2b5a3147b55c6355b64ec8/src/gz21_ocean_momentum/data/utils.py#L9-L26

This seems unnecessary -- why not allow providing a plain filepath to the forcing data? This would avoid any issues with loading data when executing on machines where data is held in various places, not just the current working directory. (Specifically, I had this issue when running the training step on CSD3.)