Open MojtabaAbdi opened 1 month ago
I have same problem
I fixed this problem with a quick fix of 56 lines on torch.float32 in the file /content/ml-agents/ml-agents/mlagents/torch_utils/torch.py . P.S this line has already been fixed in the screenshot
Hi, I think the solution for now provided by @RubSevian is the best (thanks 🤗 ) I'm going to check with MLAgents team to see where this error comes from.
@RubSevian @simoninithomas Thank you a lot. It worked for me too.
Hi, I think the solution for now provided by @RubSevian is the best (thanks 🤗 ) I'm going to check with MLAgents team to see where this error comes from.
Hi, I also meet the same problem in unit5 SnowballTarget, I tried the same solution by @RubSevian but still can't fix it (it worked when I tried to fix Unit1 problem)
Here is the screenshot of an execution of the cell after I applied @RubSevian solution:
"RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)"
@iyaijuil Based on your mistake, I can make an assumption that the problem is in choosing a device, perhaps you need to specify what specifically to use the cpu or video card (CUDA)
Hi, I think the solution for now provided by @RubSevian is the best (thanks 🤗 ) I'm going to check with MLAgents team to see where this error comes from.
Hi, I also meet the same problem in unit5 SnowballTarget, I tried the same solution by @RubSevian but still can't fix it (it worked when I tried to fix Unit1 problem)
Here is the screenshot of an execution of the cell after I applied @RubSevian solution:
"RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)"
I've encountered same problem with 5th unit
@iyaijuil Based on your mistake, I can make an assumption that the problem is in choosing a device, perhaps you need to specify what specifically to use the cpu or video card (CUDA)
Thanks for your reply. I used google colab to train the model. I followed the tutorial to use T4 GPU as my runtime type, and I used Macbook pro M3. Is it because there is any conflict within this set up?
I'm encountering the same issue on Unit 5 of Deep RL Course of RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)
No issues with Units 1-4.
Same for me. Don't know how to explicitly set device.
I've even tried to add a .to(device)
for each forward
function in. mlagents/trainers/torch_entities/networks.py
. But another error (about ambiguous bool or something) shows.
Actually, you don't need to train using a GPU. It took me 12 minutes to train the model with a cpu on colab. Thereby, you won't encounter errors.
Actually, you don't need to train using a GPU. It took me 12 minutes to train the model with a cpu on colab. Thereby, you won't encounter errors.
Thank you so much! This worked!
Looks like the proposed fix (changing torch.cuda.FloatTensor
to torch.float32
) was merged in upstream of ml-agents
.
But to me, it also doesn't work. I experienced the same as @iyaijuil described.
I finally just run experiment on cpu by adding env variable.
!CUDA_VISIBLE_DEVICES='' mlagents-learn ./config/ppo/SnowballTarget.yaml --env=./training-envs-executables/linux/SnowballTarget/SnowballTarget --run-id="SnowballTarget1" --no-graphics
To me, it took around 8 min
training for 200k
on Colab CPU, so I agree with @MojtabaAbdi - just run on CPU and that's it.
[INFO] SnowballTarget. Step: 200000. Time Elapsed: 443.264 s. Mean Reward: 25.114. Std of Reward: 2.328. Training.
### Bonus Unit 1 Notebook Error Hello. I have a problem with executing my code in Bonus Unit 1 and it arises from this line, where, honestly talking, I have not manipulated anything:
!mlagents-learn ./config/ppo/Huggy.yaml --env=./trained-envs-executables/linux/Huggy/Huggy --run-id="Huggy2" --no-graphics
Below is a screetshot of an execution of the cell:
Actually I have copied the Bonus Unit 1 notebook to my google drive and ran there.