The training step trainScript.py works by loading forcing data generated from a data processing run, and training a fresh neural net on it. This forcing data is located using MLflow. You provide a "run ID", and MLflow connects that to an existing "run" which sits in the mlruns directory. (This directory is probably local to the MLproject file, or the directory that you run the mlflow command from, but I'm not certain.)
This seems unnecessary -- why not allow providing a plain filepath to the forcing data? This would avoid any issues with loading data when executing on machines where data is held in various places, not just the current working directory. (Specifically, I had this issue when running the training step on CSD3.)
The training step
trainScript.py
works by loading forcing data generated from a data processing run, and training a fresh neural net on it. This forcing data is located using MLflow. You provide a "run ID", and MLflow connects that to an existing "run" which sits in themlruns
directory. (This directory is probably local to theMLproject
file, or the directory that you run themlflow
command from, but I'm not certain.)https://github.com/m2lines/gz21_ocean_momentum/blob/6958f5e6b4fbd1a86e2b5a3147b55c6355b64ec8/src/gz21_ocean_momentum/data/utils.py#L9-L26
This seems unnecessary -- why not allow providing a plain filepath to the forcing data? This would avoid any issues with loading data when executing on machines where data is held in various places, not just the current working directory. (Specifically, I had this issue when running the training step on CSD3.)