The VRAM utilization for bf16 is around 12 GB. I think it will be a great idea if there is a way to utilize (lets say 24gb of 4090 GPU) the remaining 12 GB to fit it up by passing batch of size 2 instead of single prompt. Would love to know if there is any support for it?
The VRAM utilization for bf16 is around 12 GB. I think it will be a great idea if there is a way to utilize (lets say 24gb of 4090 GPU) the remaining 12 GB to fit it up by passing batch of size 2 instead of single prompt. Would love to know if there is any support for it?