Open Kamranaway opened 4 days ago
Of note, the server doesn't have the NUMA notes, but the output was identical otherwise (this log is from a WSL instance).
Hello, sorry for the inconvenience. I guess you have an incompatible cDNN and/or Cuda version with the tensorflow_gpu version installed in the container. See the list in the following stackoverflow page for more details https://stackoverflow.com/questions/75789104/cubin-cuda-error-no-binary-for-gpu-error-while-running-attention-layer-with-bid Please try to identify the tensorflow_gpu in the container, find and install the compatible version and please let us know if this fixed the issue. :)
PS: As far as I remember (at least while training) we needed a GPU with at least 11GB VRAM to run bertax. So I would try the changes discussed above on the A30 first if possible. :)
I tested on a server with an A30 GPU and a laptop with an RTX 3060. I believe I followed all steps in the setup guide.