Open yangdongchao opened 2 years ago
Well, I don't know why is it happening on your side, I am afraid. Are you using windows?
If I were you, I would check if you can train a model (not SpecVQGAN) in a distributed setting using pure pytorch.
If you can train one, I would look into PyTorchLightning. It seems to miss one of your GPUs.
Also, could you please share the output of nvidia-smi and torch.cuda.device_count()
Well, I don't know why is it happening on your side, I am afraid. Are you using windows?
If I were you, I would check if you can train a model (not SpecVQGAN) in a distributed setting using pure pytorch.
If you can train one, I would look into PyTorchLightning. It seems to miss one of your GPUs.
Also, could you please share the output of nvidia-smi and torch.cuda.device_count()
Thanks for your reply, I have solve this problem.
How did you solve it?
How did you solve it?
How did you solve it?
In fact, I did anything. When I run codebook, it still only use one GPU. But when I train transformer, it can use multiple GPUs. So I give up to use multiple GPUs when training codebook.
python3 train.py --base vas_codebook.yaml -t True --gpus 0,1,
when I try to run the code with two GPUs, it report error pytorch_lightning.utilities.exceptions.MisconfigurationException: You requested GPUs: [0, 1] But your machine only has: []
But if I only uses gpus 0, there is no error happen. So I want to ask how to using multiple GPUs to train this code?