okuvshynov / slowllama

Finetune llama2-70b and codellama on MacBook Air without quantization
MIT License
431 stars 33 forks source link

slowllama: split each block to attention and feed forward #7

Closed okuvshynov closed 8 months ago

okuvshynov commented 8 months ago

essentially a tradeoff between ram usage and number of disk writes.