Hi,
I am trying to run the pretraining of the full model (which should have ~650M parameters) in a 24GB GPU card and it only runs if I set the batch size to 1 (totally useless training). What would be the memory necessary to run the full training with the preset batch size?
Also, Once finished training, I tried to run the Kmeans fitting script and it seems to require even more memory. Any idea as well on what is needed?
Hi, I am trying to run the pretraining of the full model (which should have ~650M parameters) in a 24GB GPU card and it only runs if I set the batch size to 1 (totally useless training). What would be the memory necessary to run the full training with the preset batch size? Also, Once finished training, I tried to run the Kmeans fitting script and it seems to require even more memory. Any idea as well on what is needed?
Thanks!