microsoft / rat-sql

A relation-aware semantic parsing model from English to SQL
https://arxiv.org/abs/1911.04942
MIT License
406 stars 117 forks source link

Can`t train the model with GPU on a server with RTX3090 #57

Open Quasimoodo opened 3 years ago

Quasimoodo commented 3 years ago

I first ran the code with its default config on my server, but later i noticed that the training process was actually on my CPU , and nvidia-smi returned error. After that, I found it on Dockerhub that I can use the GPU in container with --gpus allwhen run the docker, that is to say, replace docker run --rm -m4g -v /path/to/data:/mnt/data -it ratsql with docker run --rm --gpus all -m4g -v /path/to/data:/mnt/data -it ratsql I then found that nvidia-smi works in the container, but when I trained the modle, it turn out to be error like "the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331" I searched that on the internet, it is said that cuda 11+ is necessary for GPU RTX30XX. Then I modified the dockerfile to pytorch/pytorch:1.5-cuda10.1-cudnn7-devel and rebuild the image, but the same error occured again. I wonder whether I can train the model with GPU in docker .Kindly please help me to resolve this issue. Any help will be really appreciated.

m1nhtu99-hoan9 commented 3 years ago

The way I apprehend your issue is that you should check if PyTorch recognised your CUDA device. Try this in the Terminal/or any console: python3 -c "import torch; assert(torch.cuda.is_available())". What is the output?