Open zhaowei-wang-nlp opened 2 years ago
same problem
Pytorch allocator cache flushes are very expensive but indicate severe memory pressure. Can you try reducing batch size?
same problem
same problem
same problem
Same issue here, any updates?
same problem
👀
same
Any update on this issue? I am using Pytorch Lightning, originally I thought it is because I am passing too many things for each step, but after I change those, the problem is still there.
I have tried reducing the batch size, and also changing the pin_memory to False according to https://discuss.pytorch.org/t/when-to-set-pin-memory-to-true/19723 (some pytorch version has that issue), but with no luck.
I used 8xA100 with same settings and this message would gone..
Thanks @teaguexiao, I will try using more GPUs (but ours are A40 of 48 GB memory each) to see if that can help. Thanks for sharing!
I use 8*A100 40G, and reduce the batch size, and wait about 20 minutes, then there's no such message now.
same problem
same problem here
Using deepspeedtorchdistributor in databricks, loading the model with flash-attn 2
same issue here am running on 8 MI250X AMD GPUS with 128 GB VRAM
same problem.
For those who are still concerned about this issue, try setting your train_batch_size lower. It worked for me.
same problem using 8 v100 gpus
Same problem. I would like to know whether this issue will deteriorate the model's performance or if it only affects the training efficiency.
same issue: deepspeed: torch:2.1.0.dev20230424+cu117 deepspeed:0.11.0
2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
Hi everyone, I am using Zero 3-stage. I can see the above message every step. The training process is very slow. How to change my config to speed up? My config: { "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "zero_optimization": { "stage": 3, "overlap_comm": true, "contiguous_gradients": true, "sub_group_size": 5e8, "reduce_bucket_size": "auto", "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "stage3_max_live_parameters": 5e8, "stage3_max_reuse_distance": 5e8, "stage3_gather_fp16_weights_on_model_save": true }, "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 2000, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": false }
is there any solution in here?
i have same problem
Same problem using 2 H100 gpus.
Same problem.
Same here.
2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
Hi everyone, I am using Zero 3-stage. I can see the above message every step. The training process is very slow. How to change my config to speed up? My config: { "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "zero_optimization": { "stage": 3, "overlap_comm": true, "contiguous_gradients": true, "sub_group_size": 5e8, "reduce_bucket_size": "auto", "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "stage3_max_live_parameters": 5e8, "stage3_max_reuse_distance": 5e8, "stage3_gather_fp16_weights_on_model_save": true }, "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 2000, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": false }