Closed yeshwanthv5 closed 1 year ago
For evaluation with batch size of 1, I'm getting the following results.
***** eval metrics *****
epoch = 2.0
eval_accuracy = 54.3478
eval_average_metrics = 54.347826086956516
eval_loss = 0.3541
eval_mem_cpu_alloc_delta = 0MB
eval_mem_cpu_peaked_delta = 0MB
eval_mem_gpu_alloc_delta = 0MB
eval_mem_gpu_peaked_delta = 11MB
eval_runtime = 0:01:04.25
eval_samples_per_second = 2.148
Memory utilization 0.92277490234375 GB
While this is close to the reported results, I am still wondering what I am missing.
Adding the parameter count for reference
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils - Total trainable parameters 3878712
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils - Total traianable bias parameters 0
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils - Total trainable layernorm parameters 4320
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils - Total parameters 226760760
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils - For adapters/prompt-tuning, total params 1.1392202569854348
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils - For intrinsic, total params 1.0174025321231794
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils - Total trainable params 1.7402532123179344
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils - Total trainable bias params 0.0
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils - Total trainable layernorm params 0.1113771788160606
08/09/2023 23:02:04 - INFO - seq2seq.utils.utils - Total lm_head params 11.060917746053732
I found the solution. So we should run the script with train set to false in the config file. In seq2seq/configs/side_transformers.json
set "do_train": false
and "per_device_eval_batch_size": 1
to get the numbers matching to that on the paper.
Hello
I was wondering if anyone has tried to reproduce the results for GPU memory utilization from the paper. I am using the default hyperparameters but not able to reproduce the same results.
Below are the outputs I am getting on running RTE:
bash scripts/ladder_side_tuning_base.sh "0" "rte"
For traning:
For eval:
The published results are 5.5GB for training 0.88GB for inference (Table 1 in the paper). The percentage of trainable parameters is matching at 1.74% (also from Table 1). AFAIK this shouldn't depend on the platform beyond a reasonable margin of error. I'm using one v100 GPU with 16GB memory. Please let me know if I am missing something. Thank you!