Open blacknoon opened 2 months ago
Torch type bfloat16 can take up a lot of space. Have you considered quantizing components. I do this in Stable diffusion for the T5 Text encoder. First you quantize the component and save to your directory where your program runs from:
from transformers import T5EncoderModel
dtype = torch.bfloat16
text_encoder_2 = T5EncoderModel.from_pretrained(
base_model,
subfolder= "text_encoder_2" ,
torch_dtype=dtype
)
quantize(text_encoder_2, weights=qint8)
freeze(text_encoder_2)
save_directory = "./flux-dev/t5encodermodel_qint8"
text_encoder_2.save_pretrained(save_directory)
qmap_name = Path(save_directory, "quanto_qmap.json" )
qmap = quantization_map(text_encoder_2)
with open (qmap_name, "w" , encoding= "utf8" ) as f:
json.dump(qmap, f, indent= 4 )
print('T5encoder done')
Then you can refer back to this component when loading your model:
print('Quantizing text_encoder_2')
class QuantizedT5EncoderModelForCausalLM (QuantizedTransformersModel):
auto_class = T5EncoderModel
auto_class.from_config = auto_class._from_config
text_encoder_2 = QuantizedT5EncoderModelForCausalLM.from_pretrained(
"./flux-dev/t5encodermodel_qint8"
).to(dtype=dtype)
Another thought is to load your components to the 'cpu' and then when needed you can move them to 'cuda' with the specified dtype. This is basically a cpu offload in reverse. Any items no longer needed on the gpu can thusly be moved back to the 'cpu' and then use: torch.cuda.empty_cache() to clean up your memory.
Describe the bug
I've gone through all the steps to install Sora and the last step of running gradio/app.py it fails about 2/3 of the way. It hangs on loading shards at 0% and then get the following error message "torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB. GPU 0 has a total capacity of 4.00 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 5.12 GiB is allocated by PyTorch, and 124.74 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)" I have no idea hot to fix this.
Have you searched existing issues? 🔎
Reproduction
Screenshot
No response
Logs
No response
System Info
Severity
Blocking usage of gradio