Open lambdald opened 1 month ago
Hello, the problem is that your computer has not enough GPU to run the program the more easy solution is creating a github Codespacse of this repository and then execute the program like you will do it with Visual Studio Code.
For create the github code space you have to:
And that' s all.
Hello, the problem is that your computer has not enough GPU to run the program the more easy solution is creating a github Codespacse of this repository and then execute the program like you will do it with Visual Studio Code.
For create the github code space you have to:
- Open this repository
- Press the key .
- Open the terminal like you will do it in Visual Studio Code in my case "Ctrl + ñ"
- In the terminal will give you the posibility to create a local clone or a Github Codespaces, you click the option to create a Github Codespaces
And that' s all.
Possible, but the message says that the code was trying to allocate 180 GB, which is a bit insane. So it looks like some sort of bug.
@lambdald If you want us to take a look please provide the full Dockerfile you are using. If you are not using Docker, it's gonna be hard to recreate your issue since it's likely setup-specific.
True, I didn't see the end of the error.
I have the same error when I run small_city, it can be a bug of the program searching information I find this https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch. I' m unable to make it work
Hello, the problem is that your computer has not enough GPU to run the program the more easy solution is creating a github Codespacse of this repository and then execute the program like you will do it with Visual Studio Code. For create the github code space you have to:
- Open this repository
- Press the key .
- Open the terminal like you will do it in Visual Studio Code in my case "Ctrl + ñ"
- In the terminal will give you the posibility to create a local clone or a Github Codespaces, you click the option to create a Github Codespaces
And that' s all.
Possible, but the message says that the code was trying to allocate 180 GB, which is a bit insane. So it looks like some sort of bug.
@lambdald If you want us to take a look please provide the full Dockerfile you are using. If you are not using Docker, it's gonna be hard to recreate your issue since it's likely setup-specific.
@Snosixtyboo Sorry, I use conda to manage my environment instead of Docker, and I setup the python environment following the readme.
same bug
Try this:
python3
import torch
torch.cuda.empty_cache()
Reference: https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch
Try this:
- Open the python console runing
python3
- Import torch
import torch
- Clean the cache
torch.cuda.empty_cache()
Reference: https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch
I tried, and I still meet the bug
Run this code and tell us what it says
import torch
print(torch.cuda.memory_summary(device=None, abbreviated=False))
Same problem. OOM was encountered on a GPU with 80 GB memory but not encountered on a GPU with 8 GB memory, using the same dataset.
@kevintsq Interesting tell us more about the two computers
The former is an HPC using Slurm on Linux, and the latter is a Windows laptop. I should have compiled the submodules according to the correct CUDA computing capabilities. I’ve tried CUDA 12.1, 12.3, 12.4, 12.5 + PyTorch 2.3, 2.4 on the HPC, but the problem persists (12.1 is illegal memory access). The laptop runs well on CUDA 12.4 + PyTorch 2.4.
I met with the same issue on ubuntu 22.04, and fixed it by switching from cuda 12.5 to 12.2.
Hello, when I was running the small_city data, I encountered the following error. How can I solve it?