jnehring / llm_tools

0 stars 1 forks source link

Memory efficiency #9

Closed jnehring closed 1 year ago

jnehring commented 1 year ago

Huggingface reads the whole model into main memory and from there to GPU memory. So for a 27GB model we need 27GB of RAM during startup and afterwards we do not need it.

I remember there is a way to stream the model into the GPU memory instead of above behaviour. We should implement this to safe a large portion of RAM.

jnehring commented 1 year ago

done