DPMultiheadedAttention is not a full drop in replacement for nn.MultiheadAttention

pytorch / opacus

Training PyTorch models with differential privacy

https://opacus.ai

Apache License 2.0

1.65k stars 328 forks source link

DPMultiheadedAttention is not a full drop in replacement for nn.MultiheadAttention #596

Closed jfb54 closed 7 months ago

jfb54 commented 1 year ago

🐛 Bug

Not only is the API missing the batch_first parameter (https://github.com/pytorch/opacus/issues/512), it is missing the in_proj_weight parameter. Thus it makes it impossible to use Opacus with a transformer.

Expected behavior

Ability to access the in_proj_weight parameter so that it may be initialized.

Environment

Opacus 1.4 PyTorch 2.01

HuanyuZhang commented 1 year ago

Thanks for reporting bugs. Will fix it and #512 to make it work.

tranvansang commented 1 year ago

I am facing the same issue. In addition to what described in the issue description, forward() call also lacks is_causal parameter.

HuanyuZhang commented 7 months ago

Closed this issue, since we launched fixes in PR(https://github.com/pytorch/opacus/pull/598). Btw, it is also feasible to use (https://github.com/lxuechen/private-transformers) for transformers in hugging face.