pytorch / opacus

Training PyTorch models with differential privacy
https://opacus.ai
Apache License 2.0
1.67k stars 332 forks source link

Functorch gradients: investigation and fix #510

Closed ffuuugor closed 1 year ago

ffuuugor commented 1 year ago

The investigation part for this PR was done by @alexandresablayrolles, thanks for figuring out the reason the tests were failing

Background

Current implementation of functorch-based per sample gradients fails on modules which have both trainable non-recursive parameters and standard submodules, e.g. below

class LinearWithExtraParam(nn.Module):
    def __init__(self, in_features: int, out_features: int, hidden_dim: int = 8):
        super().__init__()
        self.fc = nn.Linear(in_features, hidden_dim)
        self.extra_param = nn.Parameter(torch.randn(hidden_dim, out_features))

    def forward(self, x):
        x = self.fc(x)
        x = x.matmul(self.extra_param)
        return x

The reason is - functorch hook actually computes gradients for recursive submodules too. The problem is, normal hooks are also attached to these submodules. GradSampleModule then sees two grad_sample tensors, thinks it needs to accumulate and adds them up together

Solution(s)

There are essentially two ways we can fix this: either make functorch compute per sample gradients for non-recursive parameters only or don't attach normal hooks to submodules where the parent module is handled by functorch.

This diff implements the latter option (reasoning below), for demo purposes the former option can be seen in #531

For the pure code perspective the former option (let's call it "non-recursive functorch") is more appealing to me. It better fits the existing paradigm and matches normal hooks behaviour - all of the existing code only deals with the immediate non-recursive parameters. However, it doesn't make much sense from the efficiency perspective. "non-recursive functorch" would do all the work to compute per-sample gradients for its submodules, only for them to be filtered out at the very last stage. Alternative option (a.k.a. "functorch for subtrees") does involve a bit more convoluted

This has a noticeable effect on performance. Below is the results of MNIST benchmarks with different configurations. I've tested this with different configurations, because at the end of the day, the impact on performance depends on how deep are subtrees

Mode non-recursive functorch functorch for subtrees
Standard model (CPU) 138s 136s
Standard model (GPU) 149s 150s
Mid-level model (CPU) 157s 150s
Mid-level model (GPU) 100s 97s
Extreme model (CPU) 207s 172s
Extreme model (GPU) 101s 94s
facebook-github-bot commented 1 year ago

@ffuuugor has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 1 year ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 year ago

@ffuuugor has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 1 year ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 year ago

@ffuuugor has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 1 year ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 year ago

@ffuuugor has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 1 year ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 year ago

@ffuuugor has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 year ago

@ffuuugor has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 year ago

@ffuuugor has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 1 year ago

@ffuuugor has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.