Closed AUdaltsova closed 1 month ago
Ok to be fair turns out versioning happens because different runs in one project save their artifacts under the same name, so one run has one file per epoch instead of multiple as I originally thought. The naming might become a problem though bc afaik the main usecase for these files is to be downloaded for further analysis? If so getting the correct one can get kinda tricky, maybe we want to prefix them by run or something. Or maybe people who can use wandb.api better than me can actually do it neatly and this is a non-isssue?
Ok to be fair turns out versioning happens because different runs in one project save their artifacts under the same name, so one run has one file per epoch instead of multiple as I originally thought. The naming might become a problem though bc afaik the main usecase for these files is to be downloaded for further analysis? If so getting the correct one can get kinda tricky, maybe we want to prefix them by run or something. Or maybe people who can use wandb.api better than me can actually do it neatly and this is a non-isssue?
Weird, I would have thought each articfact is seperate for experiment, but maybe im wrong there. is there a model id we can use to suffix it?
Weird, I would have thought each articfact is seperate for experiment, but maybe im wrong there. is there a model id we can use to suffix it?
so fun fact, horizon loss curve tables do get prefixed by run-{id}, so should be easy to copy it to here as soon as I can find where this happens :)
Pull Request
Description
array that stores inference results for wandb csv saving was not getting cleaned between val epochs, resulting in blowing up file sizes. Added removing results after logging.
Potentially fixes #255 as the files will be kept to 19-20 MB (for me at least; obviously will be different for different configs, this is 3 quantiles and 32 points predictions) instead of getting to 240+MB by epoch 11 etc. (Still need to remove keeping multiple versions of one artifact, not sure how to do that yet)
How Has This Been Tested?
Run training for 3 epochs, checked resulting csvs had expected sizes and contents
Checklist: