Closed segalinc closed 3 years ago
Thanks for reporting @segalinc!
So the first issue seems to be related to this https://github.com/pytorch/pytorch/issues/46983, I don't think we ever add anything to a ParameterList that's not an nn.Parameter, but will double check. That issue should be fixed in PyTorch 1.7.1 https://github.com/pytorch/pytorch/issues/49285
The other issue is new to me, I'll have a look and see why this happens.
As a side note, you may want to use nn.parallel.DistributedDataParallel
instead of just DataParallel
: https://pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead
I actually have PyTorch 1.7.1. I also tried to update it but still...
Ok, so the error you got was coming from the way we handled thread-safe tenalg backend setting in TensorLy and is fixed by https://github.com/tensorly/tensorly/commit/f0b701eba6e01b3195895ff09f975c05a6b7dd14
However, there seems to still be an issue, seemingly related to https://github.com/pytorch/pytorch/issues/36035. It seems the parameters in the factors
ParameterList are not copied to the devices -- let me know if you also experience this.
For the first issue I will try to update the package and hopefully is fixed... Thank you!
For the second issue that's exactly what happens, they are empty and get the warning
Cristina
Sent from my OnePlus
On Fri, Jan 29, 2021, 16:24 Jean Kossaifi notifications@github.com wrote:
Ok, so the error you got was coming from the way we handled thread-safe tenalg backend setting in TensorLy and is fixed by tensorly/tensorly@ f0b701e https://github.com/tensorly/tensorly/commit/f0b701eba6e01b3195895ff09f975c05a6b7dd14
However, there seems to still be an issue, seemingly related to pytorch/pytorch#36035 https://github.com/pytorch/pytorch/issues/36035. It seems the parameters in the factors ParameterList are not copied to the devices -- let me know if you also experience this.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorly/torch/issues/3#issuecomment-770118154, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHLPESFIQ43HZA7JJWCNCTS4NGUPANCNFSM4WWFGBXA .
Thanks, I've commented on the PyTorch issue at https://github.com/pytorch/pytorch/issues/36035#issuecomment-770123115, but it seems they are not actively working on this.
I pushed a temporary fix in 38d2614 Let me know if this doesn't fix your problem @segalinc
Hi I also just encountered the second issue when trying multiple GPU with torch.nn.DataParallel in both Pytorch 1.7 and 1.8. Any recommendations?
If your issue is with PyTorch, I recommending commenting in the corresponding issue: https://github.com/pytorch/pytorch/issues/36035#issuecomment-835104279
In TensorLy-Torch we use a custom ParameterList, feel free to try for your application!
Hi,
when using TurckerTRL I get this warning when running either on one or multiple GPUs
Using one GPU:
Using multiple GPUs
and then when training always with multiple GPUs I get this error: