Deepspeed mii library issues

gayatripadmani commented 1 month ago

i tried the Deepspeed mii library for create a pipeline with jupiter cuda compatibility score 8.0+ but it’s give me error :

OutOfMemoryError: CUDA out of memory. Tried to allocate 19.75 GiB. GPU 0 has a total capacity of 22.03 GiB of which 19.51 GiB is free. Including non-PyTorch memory, this process has 2.52 GiB memory in use. Of the allocated memory 1.35 GiB is allocated by PyTorch, and 5.34 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (CUDA semantics — PyTorch 2.4 documentation)

if know any one how to solve this error please help me.

loadams commented 1 month ago

Hi @gayatripadmani - you're running out of memory on your device, can you share what model you are using? Or can you try with a smaller model or with more DeepSpeed optimizations (what zero level are you running with in your ds_config)?

loadams commented 3 weeks ago

Hi @gayatripadmani - I'm going to close this for being stale. Apologies for being slow to reply - but please comment if you need us to re-open this.

microsoft / DeepSpeed-MII

Deepspeed mii library issues #530