slowllama: split each block to attention and feed forward

okuvshynov / slowllama

Finetune llama2-70b and codellama on MacBook Air without quantization

MIT License

448 stars 34 forks source link

Closed okuvshynov closed 1 year ago

okuvshynov commented 1 year ago

essentially a tradeoff between ram usage and number of disk writes.