ylsung / Ladder-Side-Tuning

PyTorch codes for "LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning"
MIT License
229 stars 9 forks source link

Memory results not matching with the paper #13

Closed yeshwanthv5 closed 1 year ago

yeshwanthv5 commented 1 year ago

Hello

I was wondering if anyone has tried to reproduce the results for GPU memory utilization from the paper. I am using the default hyperparameters but not able to reproduce the same results.

Below are the outputs I am getting on running RTE: bash scripts/ladder_side_tuning_base.sh "0" "rte"

For traning:

***** train metrics *****
  epoch                      =        2.0
  init_mem_cpu_alloc_delta   =      261MB
  init_mem_cpu_peaked_delta  =      524MB
  init_mem_gpu_alloc_delta   =      865MB
  init_mem_gpu_peaked_delta  =        0MB
  train_mem_cpu_alloc_delta  =     1085MB
  train_mem_cpu_peaked_delta =        0MB
  train_mem_gpu_alloc_delta  =       46MB
  train_mem_gpu_peaked_delta =     4234MB
  train_runtime              = 0:00:59.58
  train_samples              =       2490
  train_samples_per_second   =      0.839
Memory utilization 5.1456630859375 GB

For eval:

***** eval metrics *****
  epoch                     =                2.0
  eval_accuracy             =            54.3478
  eval_average_metrics      = 54.347826086956516
  eval_loss                 =             0.2978
  eval_mem_cpu_alloc_delta  =                0MB
  eval_mem_cpu_peaked_delta =                0MB
  eval_mem_gpu_alloc_delta  =                0MB
  eval_mem_gpu_peaked_delta =             1136MB
  eval_runtime              =         0:00:01.26
  eval_samples_per_second   =            109.105
Memory utilization 2.04748828125 GB

The published results are 5.5GB for training 0.88GB for inference (Table 1 in the paper). The percentage of trainable parameters is matching at 1.74% (also from Table 1). AFAIK this shouldn't depend on the platform beyond a reasonable margin of error. I'm using one v100 GPU with 16GB memory. Please let me know if I am missing something. Thank you!

yeshwanthv5 commented 1 year ago

For evaluation with batch size of 1, I'm getting the following results.

***** eval metrics *****
  epoch                     =                2.0
  eval_accuracy             =            54.3478
  eval_average_metrics      = 54.347826086956516
  eval_loss                 =             0.3541
  eval_mem_cpu_alloc_delta  =                0MB
  eval_mem_cpu_peaked_delta =                0MB
  eval_mem_gpu_alloc_delta  =                0MB
  eval_mem_gpu_peaked_delta =               11MB
  eval_runtime              =         0:01:04.25
  eval_samples_per_second   =              2.148
Memory utilization 0.92277490234375 GB

While this is close to the reported results, I am still wondering what I am missing.

Adding the parameter count for reference

08/09/2023 23:02:04 - INFO - seq2seq.utils.utils -   Total trainable parameters 3878712
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils -   Total traianable bias parameters 0
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils -   Total trainable layernorm parameters 4320
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils -   Total parameters 226760760
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils -   For adapters/prompt-tuning, total params 1.1392202569854348
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils -   For intrinsic, total params 1.0174025321231794
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils -   Total trainable params 1.7402532123179344
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils -   Total trainable bias params 0.0
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils -   Total trainable layernorm params 0.1113771788160606
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils -   Total lm_head params 11.060917746053732
yeshwanthv5 commented 1 year ago

I found the solution. So we should run the script with train set to false in the config file. In seq2seq/configs/side_transformers.json set "do_train": false and "per_device_eval_batch_size": 1 to get the numbers matching to that on the paper.