Memory efficiency - Githubissues

huggingface / swift-transformers

Swift Package to implement a transformers-like API in Swift

Apache License 2.0

535 stars 45 forks source link

Memory efficiency #30

Open hassanzadeh opened 6 months ago

hassanzadeh commented 6 months ago

Hey Guys, This is a great library, but I have a question. Is this library is able to use memory as efficiently as the Llama.cpp library? In otherwords, if I'm using a checkpoint that I use with Llama.cpp on a small iOS device, then will the same checkpoint work with swift-transformers (after conversion to CoreML) or there is a possibility that more memory is needed?

pcuenca commented 5 months ago

The main difference is that we are not doing quantization yet, so you need to have enough memory to run the model weights in 16-bit mode. Llama.cpp can run models in 16-bit, but it can also quantize up to 4-bit, which drastically reduces memory needs.