Open ali-john opened 3 months ago
Hi, I used the NVIDIA A100 GPU provided by the Digital Research Alliance of Canada to train the proposed model. You can find more information on the hardware here: https://docs.alliancecan.ca/wiki/Narval/en#:~:text=4%20x%20NVidia%20A100SXM4%20(40%20GB%20memory)%2C%20connected%20via%20NVLink
I remember the batch size being very small due to memory constraints so that could be a possible fix.
Hey, I was training with similar hyper-parameters as in the code and your thesis, but I am getting cuda out of memory issue. Can you guide how much memory is indeed required to train this network?