Closed tomatowithpotato closed 3 years ago
I noticed that the batchsize in the new version has been adjusted to 10, lower than 12 in old version so the problem no longer exists
You can use the max batch size that fits to your GPU. The new version of the code leverages 'gradient accumulation', which accumulates gradients reaching to size 128 and then back propagates. This is critical especially for AVA dataset training, which has more classes.
You can use the max batch size that fits to your GPU. The new version of the code leverages 'gradient accumulation', which accumulates gradients reaching to size 128 and then back propagates. This is critical especially for AVA dataset training, which has more classes.
thanks for your reply!!! gradient accumulation is a good way to effectively reduce memory usage I also plan to try ‘float16’ training, hope it works
I use a single RTX3060 12GB to train, use default settings but gpu memory is not enough as I konw, TITAN XP is also 12GB
So I am curious what the actual parameters are?