megvii-research / RevCol

Official Code of Paper "Reversible Column Networks" "RevColv2"
Apache License 2.0
246 stars 10 forks source link

Best checkpoint saving problem #13

Closed seoneut closed 1 year ago

seoneut commented 1 year ago

At L204 in main.py , only when the training hit the SAVE_FREQ will it save the checkpoint and check for whether to save the best checkpoint. The problem is, however, if the model happen to obtain the best accuracy outside the SAVE_FREQ epoch then the best weight will not be saved.

Since most of our team members run the code on a relatively small datasets, we tent to set a large value on SAVE_FREQ to save some hard drive space. This lead to the situation that the training always miss the chance to save the best.pth.

Is this a intended behavior ? Can you please add a arg to control whether it will save the best.pth every time the model obtain the highest accuracy even if it's not on the SAVE_FREQ epoch ?

nightsnack commented 1 year ago

Hi seoneut, It seems it's not a hard problem. You could modify the code and compare the past best acc with current acc, if current beyond the past, then call the save checkpoint function.

For this codebase, we are going to release next generation of revcol and we reconstruct the whole codebase. So I'm not going to modify the online version now.

seoneut commented 1 year ago

For this codebase, we are going to release next generation of revcol and we reconstruct the whole codebase.

Glad to know. Thanks !