Open cokuehuang opened 1 year ago
Hi @cokuehuang thanks for passing the scripts along. I'm investigating a few issues showing similar behavior. I'll update once I have something concrete.
Hi @cokuehuang the model paths to huggingface in the scripts are incorrect and causing errors before memory allocation occurs.
Same hardware environment,Same problem. I just select a model that has 15B parameters as Actor, but how does the 30B-opt model work properly?
@jomayeri model paths are step1(llama 13b) and step2(llama 7b) results saved in local machine.
I am facing a similar issue.
rank6: Traceback (most recent call last):
rank6: File "/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 671, in
rank6: File "/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 478, in main rank6: rlhf_engine = DeepSpeedRLHFEngine( rank6: File "/DeepSpeedExamples/applications/DeepSpeed-Chat/dschat/rlhf/rlhf_engine.py", line 50, in init rank6: self.ref = self._init_ref( rank6: File "/DeepSpeedExamples/applications/DeepSpeed-Chat/dschat/rlhf/rlhf_engine.py", line 155, in _init_ref rank6: refengine, * = deepspeed.initialize(model=ref_model,
rank6: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU has a total capacity of 79.15 GiB of which 28.62 MiB is free. Process 3126749 has 57.98 GiB memory in use. Including non-PyTorch memory, this process has 21.12 GiB memory in use. Of the allocated memory 18.41 GiB is allocated by PyTorch, and 102.17 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
System Info: Memory: 500G GPU: 8 * A100 80G Question: Why using multi gpus in init of DeepSpeedRLHFEngine used much more memroy compared to using single gpu ?
Reproduce: Copy model_load.py to DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning Copy test_model_load.sh to DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/training_scripts/single_node Test with 8 GPUs: cd DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning bash training_scripts/single_node/test_model_load.sh max memory used: 500G logs:
Test with 1 GPU: cd DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning CUDA_VISIBLE_DEVICES=0 bash training_scripts/single_node/test_model_load.sh
max memory used: 80G logs:
files.zip