Open jbusecke opened 1 week ago
I believe this name would be good: Modelname_epoch_train_dataset_eval_dataset_2D/3D.
We do not require the version of preprocessing because my workflow saves the source code at the point of training. It also saves the configuration of the training (number of GPUs/ machine etc.).
But there could be a case when the preprocessing was run with a different version than the training, right? We ought to capture both? But I think as long as we have the repo+version in each dataset, and then add the naming of the input dataset to the prediction, we have full provenance.
Yepp, it actually stores the entire source code not just the training code. But sure we could add the hash. Could you provide a simple example of a file just to confirm my understanding?
I am writing the hash of the preprocessing into the input datasets attributes, so you could grab it from there. Ill show an example once I win the battle with dask to write out this damn dataset.
Can we come up with a generic naming scheme for input output names?
What are the parameters we need to distinguish?
Input:
Output: