princeton-nlp / LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
https://arxiv.org/abs/2310.06694
MIT License
533 stars 39 forks source link

ShearedCodeLLama #35

Closed SinanAkkoyun closed 9 months ago

SinanAkkoyun commented 9 months ago

Hi! I am working on a copilot backend and, even though I am using a GPTQ quant of codellama7b, it is still eating lots of VRAM DeepSeek coder seems to have severe issues understanding fill in the middle

I wanted to ask if you plan on also shearing CodeLlama? :)

YanxiZSQ commented 9 months ago

I also want to shearing codellama, do you success it?

SinanAkkoyun commented 9 months ago

@YanxiZSQ It would cost around 2k given this estimate https://github.com/princeton-nlp/LLM-Shearing/issues/22

SinanAkkoyun commented 9 months ago

I just fixed my deepseek prompt and I have to say they are very great models, closing!