openclimatefix / PVNet

PVnet main repo
MIT License
21 stars 5 forks source link

add flushing of val epoch resluts #256

Closed AUdaltsova closed 1 month ago

AUdaltsova commented 2 months ago

Pull Request

Description

array that stores inference results for wandb csv saving was not getting cleaned between val epochs, resulting in blowing up file sizes. Added removing results after logging.

Potentially fixes #255 as the files will be kept to 19-20 MB (for me at least; obviously will be different for different configs, this is 3 quantiles and 32 points predictions) instead of getting to 240+MB by epoch 11 etc. (Still need to remove keeping multiple versions of one artifact, not sure how to do that yet)

How Has This Been Tested?

Run training for 3 epochs, checked resulting csvs had expected sizes and contents

Checklist:

AUdaltsova commented 2 months ago

Ok to be fair turns out versioning happens because different runs in one project save their artifacts under the same name, so one run has one file per epoch instead of multiple as I originally thought. The naming might become a problem though bc afaik the main usecase for these files is to be downloaded for further analysis? If so getting the correct one can get kinda tricky, maybe we want to prefix them by run or something. Or maybe people who can use wandb.api better than me can actually do it neatly and this is a non-isssue?

peterdudfield commented 2 months ago

Ok to be fair turns out versioning happens because different runs in one project save their artifacts under the same name, so one run has one file per epoch instead of multiple as I originally thought. The naming might become a problem though bc afaik the main usecase for these files is to be downloaded for further analysis? If so getting the correct one can get kinda tricky, maybe we want to prefix them by run or something. Or maybe people who can use wandb.api better than me can actually do it neatly and this is a non-isssue?

Weird, I would have thought each articfact is seperate for experiment, but maybe im wrong there. is there a model id we can use to suffix it?

AUdaltsova commented 2 months ago

Weird, I would have thought each articfact is seperate for experiment, but maybe im wrong there. is there a model id we can use to suffix it?

so fun fact, horizon loss curve tables do get prefixed by run-{id}, so should be easy to copy it to here as soon as I can find where this happens :)