The activation checkpointing option of DeepSpeed-Infinity is not available when using HF models (only Megatron-DeepSpeed supports the full set of the DeepSpeed-Infinity options). As a workaround, I directly gave "--gradient_checkpointing" option to HF argument parser. However, I ran into "AssertionError: param 773 already assigned swap buffer id 47". The error didn't appear only when I disabled the following ZeRO stage3 options: stage3_max_live_parameters, stage3_max_reuse_distance, stage3_prefetch_bucket_size.
Describe the bug
This is a forked issue requested by @tjruwase from: https://github.com/microsoft/DeepSpeed/issues/4047#issuecomment-1657086291
The activation checkpointing option of DeepSpeed-Infinity is not available when using HF models (only Megatron-DeepSpeed supports the full set of the DeepSpeed-Infinity options). As a workaround, I directly gave "--gradient_checkpointing" option to HF argument parser. However, I ran into "AssertionError: param 773 already assigned swap buffer id 47". The error didn't appear only when I disabled the following ZeRO stage3 options: stage3_max_live_parameters, stage3_max_reuse_distance, stage3_prefetch_bucket_size.
To Reproduce
Workload
Run script
DeepSpeed config (config.json)
Expected behavior
ds_report output
Screenshots
Attached above.
System info (please complete the following information):
Pip requirements
Launcher context
Are you launching your experiment with the
deepspeed
launcher, MPI, or something else?Docker context
Are you using a specific docker image that you can share?