Closed moaradwan closed 2 years ago
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
The following table shows the memory and runtime metrics after running it on CircleCI using gpu.nvidia.small.multi
Threshold based on paper:
base_layer/value | memory | memory | memory | memory | runtime | runtime | runtime | runtime |
---|---|---|---|---|---|---|---|---|
control | dp/control | gsm | gsm/control | control | dp/control | gsm | gsm/control | |
groupnorm | 107520.0 | 140288.0 | 1.3047619047619048 | 0.00040995383000020535 | 0.000998464277499892 | 2.4355529926367363 | ||
instancenorm | 6345728.0 | 7394304.0 | 1.1652412457640795 | 0.0005690672350000909 | 0.0012307228030001341 | 2.162701922207729 | ||
layernorm | 28672.0 | 37888.0 | 1.3214285714285714 | 0.0003554899874999648 | 0.000726604483499955 | 2.043952035358034 |
Threshold based on paper:
base_layer/value | memory | memory | memory | memory | runtime | runtime | runtime | runtime |
---|---|---|---|---|---|---|---|---|
control | dp/control | gsm | gsm/control | control | dp/control | gsm | gsm/control | |
linear | 3283968.0 | 36903936.0 | 11.237605238540691 | 0.00041353099599993477 | 0.0010285112289999743 | 2.487144226064584 |
Threshold based on paper:
base_layer/value | memory | memory | memory | memory | runtime | runtime | runtime | runtime |
---|---|---|---|---|---|---|---|---|
control | dp/control | gsm | gsm/control | control | dp/control | gsm | gsm/control | |
mha | 13630464.0 | 24162304.0 | 1.772669220945083 | 0.0012178095074999362 | 0.0037577736325000903 | 3.0856826206050023 |
Threshold based on paper:
base_layer/value | memory | memory | memory | memory | memory | runtime | runtime | runtime | runtime | runtime |
---|---|---|---|---|---|---|---|---|---|---|
control | dp | dp/control | gsm | gsm/control | control | dp | dp/control | gsm | gsm/control | |
gru | 11186176.0 | 12154368.0 | 1.0865525448553643 | 16603136.0 | 1.484254851702673 | 0.004054944954499915 | 0.05725822975399994 | 14.120593595347938 | 0.14559441694749992 | 35.905399106818614 |
Threshold based on paper:
base_layer/value | memory | memory | memory | memory | memory | runtime | runtime | runtime | runtime | runtime |
---|---|---|---|---|---|---|---|---|---|---|
control | dp | dp/control | gsm | gsm/control | control | dp | dp/control | gsm | gsm/control | |
lstm | 10801152.0 | 11527680.0 | 1.06726393629124 | 18021376.0 | 1.6684679560106181 | 0.004167011488000128 | 0.051514086188000296 | 12.362357612007312 | 0.13722742892250026 | 32.93185759570818 |
Threshold based on paper:
base_layer/value | memory | memory | memory | memory | memory | runtime | runtime | runtime | runtime | runtime |
---|---|---|---|---|---|---|---|---|---|---|
control | dp | dp/control | gsm | gsm/control | control | dp | dp/control | gsm | gsm/control | |
rnn | 6287360.0 | 5936640.0 | 0.9442182410423453 | 6346240.0 | 1.0093648208469055 | 0.003437245790000362 | 0.020753186856500516 | 6.03773722463366 | 0.09805997271550064 | 28.52864726775629 |
Threshold based on paper:
base_layer/value | memory | memory | memory | memory | runtime | runtime | runtime | runtime |
---|---|---|---|---|---|---|---|---|
control | dp/control | gsm | gsm/control | control | dp/control | gsm | gsm/control | |
embedding | 24021504.0 | 280028160.0 | 11.657394974103203 | 0.0004076045779994501 | 0.0023867599680010014 | 5.855576941052466 |
Right now I only excluded conv layer since it takes up to an hour when run locally.
In total the new tasks increase the pipeline execution time by ~20 minutes, with recurrent layers taking most of that time. This makes the integration pipeline total execution time 32 minutes.
Some improvements:
Currently I used the paper to infer most of the highlights. The recurrent layer validation has failed though.
Group | Memory Threshold - Hi Memory | Runtime Threshold - Hi Runtime |
---|---|---|
4: GRU | :white_check_mark: 1.5, 1.48 | :no_entry_sign: 18 , 35.9 |
5: LSTM | :no_entry_sign: 1.2, 1.668 | :no_entry_sign: 16.5, 32.9 |
6: RNN | :white_check_mark: 1.5, 1.009 | :no_entry_sign: 16.5, 28.528 |
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@ffuuugor @ashkan-software regarding point number 2 mentioned in https://github.com/pytorch/opacus/pull/481#issuecomment-1228317342 should I just update the threshold of the failing tests to let it pass?
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
The jobs will run separately under the name micro_benchmarks_py37_torch_release_cuda
only under nightly
. The whole run will take: ~27 minutes. There are 10 tasks as follows.
Threshold based on paper:
base_layer/value | memory | memory | memory | memory | runtime | runtime | runtime | runtime | ||
---|---|---|---|---|---|---|---|---|---|---|
control | dp/control | gsm | gsm/control | control | dp/control | gsm | gsm/control | |||
groupnorm | 107520.0 | 140288.0 | 1.3047619047619048 | 0.00040995383000020535 | 0.000998464277499892 | 2.4355529926367363 | ||||
instancenorm | 6345728.0 | 7394304.0 | 1.1652412457640795 | 0.0005690672350000909 | 0.0012307228030001341 | 2.162701922207729 | ||||
layernorm | 28672.0 | 37888.0 | 1.3214285714285714 | 0.0003554899874999648 | 0.000726604483499955 | 2.043952035358034 | ||||
mha | 13630464.0 | 13632512.0 | 1.00015025167155 | 0.0012336450990000003 | 0.001333286202000039 | 1.0807696663171673 |
Threshold based on paper:
base_layer/value | memory | memory | memory | memory | runtime | runtime | runtime | runtime |
---|---|---|---|---|---|---|---|---|
control | dp/control | gsm | gsm/control | control | dp/control | gsm | gsm/control | |
linear | 3283968.0 | 36903936.0 | 11.237605238540691 | 0.00041353099599993477 | 0.0010285112289999743 | 2.487144226064584 |
Threshold based on paper:
base_layer/value | memory | memory | memory | memory | runtime | runtime | runtime | runtime |
---|---|---|---|---|---|---|---|---|
control | dp/control | gsm | gsm/control | control | dp/control | gsm | gsm/control | |
mha | 13630464.0 | 24162304.0 | 1.772669220945083 | 0.0012178095074999362 | 0.0037577736325000903 | 3.0856826206050023 |
DPGRU:
Pipeline runtime: 2m28s
GSM-DPGRU:
base_layer/value | memory | memory | memory | memory | memory | runtime | runtime | runtime | runtime | runtime |
---|---|---|---|---|---|---|---|---|---|---|
control | dp | dp/control | gsm | gsm/control | control | dp | dp/control | gsm | gsm/control | |
gru | 11186176.0 | 12154368.0 | 1.0865525448553643 | 16603136.0 | 1.484254851702673 | 0.004054944954499915 | 0.05725822975399994 | 14.120593595347938 | 0.14559441694749992 | 35.905399106818614 |
DLSTM
GSMDLSTM
base_layer/value | memory | memory | memory | memory | memory | runtime | runtime | runtime | runtime | runtime |
---|---|---|---|---|---|---|---|---|---|---|
control | dp | dp/control | gsm | gsm/control | control | dp | dp/control | gsm | gsm/control | |
lstm | 10801152.0 | 11527680.0 | 1.06726393629124 | 18021376.0 | 1.6684679560106181 | 0.004167011488000128 | 0.051514086188000296 | 12.362357612007312 | 0.13722742892250026 | 32.93185759570818 |
DPRNN:
GSM-DPRNN:
base_layer/value | memory | memory | memory | memory | memory | runtime | runtime | runtime | runtime | runtime |
---|---|---|---|---|---|---|---|---|---|---|
control | dp | dp/control | gsm | gsm/control | control | dp | dp/control | gsm | gsm/control | |
rnn | 6287360.0 | 5936640.0 | 0.9442182410423453 | 6346240.0 | 1.0093648208469055 | 0.003437245790000362 | 0.020753186856500516 | 6.03773722463366 | 0.09805997271550064 | 28.52864726775629 |
Threshold based on paper:
base_layer/value | memory | memory | memory | memory | runtime | runtime | runtime | runtime |
---|---|---|---|---|---|---|---|---|
control | dp/control | gsm | gsm/control | control | dp/control | gsm | gsm/control | |
embedding | 24021504.0 | 280028160.0 | 11.657394974103203 | 0.0004076045779994501 | 0.0023867599680010014 | 5.855576941052466 |
@moaradwan has updated the pull request. You must reimport the pull request before landing.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@ffuuugor I updated the code as follows:
Consider the comment above for more details.
Types of changes
[ ] Docs change / refactoring / dependency upgrade
Issue: https://github.com/pytorch/opacus/issues/368
Motivation and Context / Related issue
There's a task #368 for committing benchmark code. In this change I add these benchmarks into CI integration tests. To choose thresholds I ran the benchmarks locally on all the layers with (batch size: 16, num_runs: 100, num_repeats: 20, forward_only: False), please check the comment below for more details.
Using the report and section 3 in the paper, I parameterised the runtime and memory thresholds for different layers.
How Has This Been Tested (if it applies)
circleci config process .circleci/config.yml
circleci local execute --job JOB_NAME
Checklist