juhongm999 / hsnet

Official PyTorch Implementation of Hypercorrelation Squeeze for Few-Shot Segmentation, ICCV 2021
231 stars 43 forks source link

Training time #12

Closed SunghwanHong closed 3 years ago

SunghwanHong commented 3 years ago

Hi! Thanks for making this wonderful work publicly available. I have a question about training time. Currently, i'm trying to reproduce the results (pascal) in the paper.

The default setting for niter is set to 2000, and in the repository, it says it takes around 2 days to converge using 4 2080 GPU. So does it mean that the training converges before it reaches 2000 iterations? or do we have to iterate the whole 2000..? It seems like with batch size of 20, it takes around 5 minutes to train an epoch. This means that to iterate 2000 times, it takes 10000 minutes..?

Also it seems like the validation takes around 3 times longer than the training, am i doing something wrong? Thank you!

juhongm999 commented 3 years ago

Thanks for your interest in our research. We follow the standard early-stopping procedure in choosing the best performing model. By the best model, we mean the model at some epochs when the validation (mIoU) curve starts to saturate. As we normally do not know when such epochs are, one typically sets the number of iterations unbounded (niter=2000 in our case), keeps an eye on the training process, and picks the best model based on the validation performance. This is what we did in our experiments.

Validation takes a lot more time compared to training since the nworker for the validation dataloader is set to 0 whereas it is set to 8 for the training, in order to reproduce the exactly the same results as in our paper by removing stochasticity in sampling support examples. Note that there are some ways to get around this slow validation while keeping reproducibility by dynamically setting random seed in the dataloader with nworker > 1.