Closed wjkoh closed 9 months ago
You should use torch.float32
Thanks for the feedback. It actually works with torch.float32
! However, it kind of defeats the purpose of reducing VRAM usage since we have to switch from float16 to float32 when enabling model CPU offload.
I encountered a
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
when modifying theinfer.py
file in the following manner:The call stack is as follows: