🐛 Bug

The adjoint operations in CoLA are moving the Jacobian tensor from the GPU to the CPU, which can lead to performance issues and inconsistencies.

To reproduce

Code snippet to reproduce

import torch
import cola

dev = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

x = torch.randn(100).to(dev)
fn = torch.nn.Sequential(torch.nn.Linear(100, 64), torch.nn.Linear(64, 100)).to(dev)

J = cola.ops.Jacobian(fn, x)
print(J.device, J.T.device, J.H.device, cola.ops.Adjoint(J).device)

Stack trace/error message

cuda:0 cpu cpu cpu

Expected Behavior

Output should look like:

cuda:0 cuda:0 cuda:0 cuda:0

System information

Please complete the following information:

0.0.6.dev11+gf3c5494
2.1.2
Springdale Open Enterprise Linux 8.6 (Modena)

Additional context

Possibly an issue here https://github.com/wilson-labs/cola/blob/main/cola/ops/operators.py#L361 where the device is not being used

wilson-labs / cola

Adjoint operations move Jacobian from GPU to CPU #88

🐛 Bug

To reproduce

Expected Behavior

System information

Additional context