Running the huge model on CPU

cvinker commented 1 year ago

From my understanding, using the transformers accelerate tool, running the HUGE model means it needs to load the entire thing into RAM. Is there any way for it to process as it loads into ram, or is it a necessity? I have 614GB of ram, I am also curious if there's a way to edit the program while the model is stored in memory. Is there any way to change how it processes on the CPU? I know that the GPU can choose between FP32,16, and INT 8 but I don't know how to find info on running on CPU beyond the huggingface.co example.

domenicrosati commented 1 year ago

Could you use ONNX to optimize it first (ie. transformers optimum)? You may want to do that anyway if you are running on CPU.

cvinker commented 1 year ago

I spent a few hours fiddling with it but I kept getting errors, is ONNX better to the point I should investigate further? I think my system got botched messing around with it. Would ONNX reduce memory usage, I struggled to find what exactly it would improve. Thanks!

saptarshi059 commented 1 year ago

I tried using ONNX with this,.. I don't they have support for it yet,..

paperswithcode / galai

Running the huge model on CPU #41