pytorch / opacus

Training PyTorch models with differential privacy
https://opacus.ai
Apache License 2.0
1.71k stars 341 forks source link

Add benchmarks to CI #481

Closed moaradwan closed 2 years ago

moaradwan commented 2 years ago

Types of changes

Using the report and section 3 in the paper, I parameterised the runtime and memory thresholds for different layers.

How Has This Been Tested (if it applies)

facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

moaradwan commented 2 years ago

Results after running on GPU

The following table shows the memory and runtime metrics after running it on CircleCI using gpu.nvidia.small.multi

Group 1: groupnorm, instancenorm, layernorm, dpmha

Threshold based on paper:

base_layer/value memory memory memory memory runtime runtime runtime runtime
control dp/control gsm gsm/control control dp/control gsm gsm/control
groupnorm 107520.0 140288.0 1.3047619047619048 0.00040995383000020535 0.000998464277499892 2.4355529926367363
instancenorm 6345728.0 7394304.0 1.1652412457640795 0.0005690672350000909 0.0012307228030001341 2.162701922207729
layernorm 28672.0 37888.0 1.3214285714285714 0.0003554899874999648 0.000726604483499955 2.043952035358034

Group 2: Linear layer

Threshold based on paper:

base_layer/value memory memory memory memory runtime runtime runtime runtime
control dp/control gsm gsm/control control dp/control gsm gsm/control
linear 3283968.0 36903936.0 11.237605238540691 0.00041353099599993477 0.0010285112289999743 2.487144226064584

Group 3: GSM-DPMHA

Threshold based on paper:

base_layer/value memory memory memory memory runtime runtime runtime runtime
control dp/control gsm gsm/control control dp/control gsm gsm/control
mha 13630464.0 24162304.0 1.772669220945083 0.0012178095074999362 0.0037577736325000903 3.0856826206050023

Group 4: GRU

Threshold based on paper:

base_layer/value memory memory memory memory memory runtime runtime runtime runtime runtime
control dp dp/control gsm gsm/control control dp dp/control gsm gsm/control
gru 11186176.0 12154368.0 1.0865525448553643 16603136.0 1.484254851702673 0.004054944954499915 0.05725822975399994 14.120593595347938 0.14559441694749992 35.905399106818614

Group 5: LSTM

Threshold based on paper:

base_layer/value memory memory memory memory memory runtime runtime runtime runtime runtime
control dp dp/control gsm gsm/control control dp dp/control gsm gsm/control
lstm 10801152.0 11527680.0 1.06726393629124 18021376.0 1.6684679560106181 0.004167011488000128 0.051514086188000296 12.362357612007312 0.13722742892250026 32.93185759570818

Group 6: RNN

Threshold based on paper:

base_layer/value memory memory memory memory memory runtime runtime runtime runtime runtime
control dp dp/control gsm gsm/control control dp dp/control gsm gsm/control
rnn 6287360.0 5936640.0 0.9442182410423453 6346240.0 1.0093648208469055 0.003437245790000362 0.020753186856500516 6.03773722463366 0.09805997271550064 28.52864726775629

Group 7: Embedding

Threshold based on paper:

base_layer/value memory memory memory memory runtime runtime runtime runtime
control dp/control gsm gsm/control control dp/control gsm gsm/control
embedding 24021504.0 280028160.0 11.657394974103203 0.0004076045779994501 0.0023867599680010014 5.855576941052466

Open points

1. Reducing runtime of the jobs

Right now I only excluded conv layer since it takes up to an hour when run locally.

In total the new tasks increase the pipeline execution time by ~20 minutes, with recurrent layers taking most of that time. This makes the integration pipeline total execution time 32 minutes.

Some improvements:

2. Changing thresholds

Currently I used the paper to infer most of the highlights. The recurrent layer validation has failed though.

Group Memory Threshold - Hi Memory Runtime Threshold - Hi Runtime
4: GRU :white_check_mark: 1.5, 1.48 :no_entry_sign: 18 , 35.9
5: LSTM :no_entry_sign: 1.2, 1.668 :no_entry_sign: 16.5, 32.9
6: RNN :white_check_mark: 1.5, 1.009 :no_entry_sign: 16.5, 28.528
facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

moaradwan commented 2 years ago

@ffuuugor @ashkan-software regarding point number 2 mentioned in https://github.com/pytorch/opacus/pull/481#issuecomment-1228317342 should I just update the threshold of the failing tests to let it pass?

facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

moaradwan commented 2 years ago

Final results after updating thresholds

The jobs will run separately under the name micro_benchmarks_py37_torch_release_cuda only under nightly. The whole run will take: ~27 minutes. There are 10 tasks as follows.

Group 1: GSM of: (groupnorm, instancenorm, layernorm), and DPMHA

Threshold based on paper:

base_layer/value memory memory memory memory runtime runtime runtime runtime
control dp/control gsm gsm/control control dp/control gsm gsm/control
groupnorm 107520.0 140288.0 1.3047619047619048 0.00040995383000020535 0.000998464277499892 2.4355529926367363
instancenorm 6345728.0 7394304.0 1.1652412457640795 0.0005690672350000909 0.0012307228030001341 2.162701922207729
layernorm 28672.0 37888.0 1.3214285714285714 0.0003554899874999648 0.000726604483499955 2.043952035358034
mha 13630464.0 13632512.0 1.00015025167155 0.0012336450990000003 0.001333286202000039 1.0807696663171673

Group 2: GSM-Linear layer

Threshold based on paper:

base_layer/value memory memory memory memory runtime runtime runtime runtime
control dp/control gsm gsm/control control dp/control gsm gsm/control
linear 3283968.0 36903936.0 11.237605238540691 0.00041353099599993477 0.0010285112289999743 2.487144226064584

Group 3: GSM-DPMHA

Threshold based on paper:

base_layer/value memory memory memory memory runtime runtime runtime runtime
control dp/control gsm gsm/control control dp/control gsm gsm/control
mha 13630464.0 24162304.0 1.772669220945083 0.0012178095074999362 0.0037577736325000903 3.0856826206050023

Group 4&5: DPGRU and GSM-DPGRU

DPGRU:

Group 6&7: DLSTM and GSM-DPLSTM

DLSTM

GSMDLSTM

base_layer/value memory memory memory memory memory runtime runtime runtime runtime runtime
control dp dp/control gsm gsm/control control dp dp/control gsm gsm/control
lstm 10801152.0 11527680.0 1.06726393629124 18021376.0 1.6684679560106181 0.004167011488000128 0.051514086188000296 12.362357612007312 0.13722742892250026 32.93185759570818

Group 8&9: DPRNN and GSM-DPRNN

DPRNN:

GSM-DPRNN:

base_layer/value memory memory memory memory memory runtime runtime runtime runtime runtime
control dp dp/control gsm gsm/control control dp dp/control gsm gsm/control
rnn 6287360.0 5936640.0 0.9442182410423453 6346240.0 1.0093648208469055 0.003437245790000362 0.020753186856500516 6.03773722463366 0.09805997271550064 28.52864726775629

Group 10: Embedding

Threshold based on paper:

base_layer/value memory memory memory memory runtime runtime runtime runtime
control dp/control gsm gsm/control control dp/control gsm gsm/control
embedding 24021504.0 280028160.0 11.657394974103203 0.0004076045779994501 0.0023867599680010014 5.855576941052466
facebook-github-bot commented 2 years ago

@moaradwan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 2 years ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

moaradwan commented 2 years ago

@ffuuugor I updated the code as follows:

Consider the comment above for more details.