The first time the whole training process can be completed when the batch_size is 8 and the worker is 1. When I started a new training, the memory suddenly increased a lot, and finally a CUDA out of memory error was generated. Even if I put the batch_size Adjusting to 2 will only slow down the training process. I got this result on both the colab and the local host. Does anyone know why?
The first time the whole training process can be completed when the batch_size is 8 and the worker is 1. When I started a new training, the memory suddenly increased a lot, and finally a CUDA out of memory error was generated. Even if I put the batch_size Adjusting to 2 will only slow down the training process. I got this result on both the colab and the local host. Does anyone know why?