Closed yashsavani closed 1 month ago
The adjoint operations in CoLA are moving the Jacobian tensor from the GPU to the CPU, which can lead to performance issues and inconsistencies.
Code snippet to reproduce
import torch import cola dev = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") x = torch.randn(100).to(dev) fn = torch.nn.Sequential(torch.nn.Linear(100, 64), torch.nn.Linear(64, 100)).to(dev) J = cola.ops.Jacobian(fn, x) print(J.device, J.T.device, J.H.device, cola.ops.Adjoint(J).device)
Stack trace/error message
cuda:0 cpu cpu cpu
Output should look like:
cuda:0 cuda:0 cuda:0 cuda:0
Please complete the following information:
Possibly an issue here https://github.com/wilson-labs/cola/blob/main/cola/ops/operators.py#L361 where the device is not being used
Thank you for pointing out this bad allocation of devices. Also thank you for the concise and well though code snippet to reproduce. I've just added a fix on #97.
🐛 Bug
The adjoint operations in CoLA are moving the Jacobian tensor from the GPU to the CPU, which can lead to performance issues and inconsistencies.
To reproduce
Code snippet to reproduce
Stack trace/error message
Expected Behavior
Output should look like:
System information
Please complete the following information:
Additional context
Possibly an issue here https://github.com/wilson-labs/cola/blob/main/cola/ops/operators.py#L361 where the device is not being used