Opacus with torch_geometric.nn and GCN's

sagerkudrick commented 1 year ago

Does Opacus work with GCNConv?

I'm attempting to use Opacus with a GCN, with the model defined as such:

class GCN(torch.nn.Module):
    def __init__(self, hidden_channels):
        super(GCN, self).__init__()
        torch.manual_seed(12345)
        self.conv1 = GCNConv(dataset.num_node_features, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, hidden_channels)
        self.conv3 = GCNConv(hidden_channels, hidden_channels)
        self.lin = Linear(hidden_channels, dataset.num_classes)

    def forward(self, x, edge_index, batch):
        x = self.conv1(x, edge_index)
        x = x.relu()
        x = self.conv2(x, edge_index)
        x = x.relu()
        x = self.conv3(x, edge_index)

        x = global_mean_pool(x, batch)

        x = F.dropout(x, p=0.5, training=self.training)
        x = self.lin(x)

        return x

When training, however, I'm running into

Traceback (most recent call last):
  File "c:\Users\me\Desktop\github\opacus_graph\prt_3.py", line 121, in <module>
    train()
  File "c:\Users\me\Desktop\github\opacus_graph\prt_3.py", line 89, in train
    loss.backward()  # Derive gradients.
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 69, in __call__
    return self.hook(module, *args, **kwargs)
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\opacus\grad_sample\grad_sample_module.py", line 337, in capture_backprops_hook
    grad_samples = grad_sampler_fn(module, activations, backprops)
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\opacus\grad_sample\functorch.py", line 58, in ft_compute_per_sample_gradient
    per_sample_grads = layer.ft_compute_sample_grad(
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_functorch\vmap.py", line 434, in wrapped
    return _flat_vmap(
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_functorch\vmap.py", line 39, in fn
    return f(*args, **kwargs)
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_functorch\vmap.py", line 619, in _flat_vmap
    batched_outputs = func(*batched_inputs, **kwargs)
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_functorch\eager_transforms.py", line 1380, in wrapper
    results = grad_and_value(func, argnums, has_aux=has_aux)(*args, **kwargs)
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_functorch\vmap.py", line 39, in fn
    return f(*args, **kwargs)
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_functorch\eager_transforms.py", line 1245, in wrapper
    output = func(*args, **kwargs)
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\opacus\grad_sample\functorch.py", line 34, in compute_loss_stateless_model
    output = flayer(params, batched_activations)
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_functorch\make_functional.py", line 342, in forward
    return self.stateless_model(*args, **kwargs)
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: GCNConv.forward() missing 1 required positional argument: 'edge_index'

Occuring within

def train():
    model.train().to(device)

    for data in train_loader:  # Iterate in batches over the training dataset.
         data = data.to(device)
         out = model(data.x, data.edge_index, data.batch)  # Perform a single forward pass.
         loss = criterion(out, data.y)  # Compute the loss.
         loss = loss.to(device)
         loss.backward()  # Derive gradients.
         optimizer.step()  # Update parameters based on gradients.
         optimizer.zero_grad()  # Clear gradients.

On the loss.backward(), it's worth noting that training and evaluating work regularly, but upon doing

 model, optimizer, train_loader = privacy_engine.make_private_with_epsilon(
    module = model,
    optimizer=optimizer,
    data_loader=train_loader,
    epochs=401,
    target_epsilon=5,
    target_delta=0.001,
    max_grad_norm=1,
)

and training, it begins to throw the error with the new model

Thank you!

marlowe518 commented 1 year ago

Same issue here. Did you address this error? : ) @SagerKudrick

sagerkudrick commented 1 year ago

Same issue here. Did you address this error? : ) @SagerKudrick

Hey @marlowe518 I did, the problem was with this:

model, optimizer, data_loader = privacy_engine.make_private_with_epsilon( module = model, optimizer=optimizer, data_loader=data_loader, epochs=10, target_epsilon=5, target_delta=0.0001, max_grad_norm=255, batch_first=True )

batch_first = True results in "tensor is shape [K, batch_size, ...], if false: [batch_size, ...]", the input tensor to the model is modified by this, throwing off the positional argument, I was able to solve this by making batch_first=False

sagerkudrick commented 1 year ago

I'm not entirely sure if Opacus supports graphs though- validating using PrivacyEngine says that our GCN model is valid, but we're running into a new error here:

File "c:\Users\me\Desktop\github\opacus_graph\tds.py", line 88, in <module>
    loss.backward()
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 69, in __call__
    return self.hook(module, *args, **kwargs)
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\opacus\grad_sample\grad_sample_module.py", line 337, in capture_backprops_hook
    grad_samples = grad_sampler_fn(module, activations, backprops)
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\opacus\grad_sample\functorch.py", line 58, in ft_compute_per_sample_gradient
    per_sample_grads = layer.ft_compute_sample_grad(
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_functorch\vmap.py", line 426, in wrapped
    batch_size, flat_in_dims, flat_args, args_spec = _process_batched_inputs(in_dims, args, func)
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_functorch\vmap.py", line 119, in _process_batched_inputs
    return _validate_and_get_batch_size(flat_in_dims, flat_args), flat_in_dims, flat_args, args_spec
  File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_functorch\vmap.py", line 52, in _validate_and_get_batch_size
    raise ValueError(
ValueError: vmap: Expected all tensors to have the same size in the mapped dimension, got sizes [16, 7] for the mapped dimension

We're using the default DataLoader from from torch_geometric.loader import DataLoader and our loader looks like this: data_loader = DataLoader(dataset, batch_size=32, shuffle=False) (Using torch.utils.data and torch_geometric.loader DataLoader are resulting in the same error) Our datasets are

dataset = Planetoid(root='/tmp/Cora', name='Cora')

data = dataset[0].to(device)

And our trainer:

for epoch in range(10):
    for batch in data_loader:
        print("batch ", batch)
        optimizer.zero_grad()
        out = model(batch)
        out.to(device)
        loss = F.nll_loss(out, batch.y)
        loss.backward()
        optimizer.step()

nhianK commented 6 months ago

I get similar behavior when I wrap one of my models with GradSamplerModule(). Were you able to solve this issue? Doesn't work with batch_first=false when I use it with GradSampleModule(model,batch_first=False)

Zening-Li commented 5 months ago

@SagerKudrick I have the same problem. Have you solved this mistake?

pytorch / opacus

Opacus with torch_geometric.nn and GCN's #588