neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
3.01k stars 176 forks source link

Support for Llama2? #1322

Closed BDHU closed 1 year ago

BDHU commented 1 year ago

Is it possible to test the speedup gain using Llama2 when applying sparsity? If so are there any tutorials? Thanks!

mgoin commented 1 year ago

Hi @BDHU , it is completely possible to see speedup on llama2 models similar to MPT as they are a very similar architecture! We don't have examples of it sparsified yet but are working on it and hope to share in the coming weeks

jeanniefinks commented 1 year ago

@BDHU If you want to be proactively be alerted to when our Llama 2 developments hit, I strongly encourage you and others reading to signup for our newsletter on the bottom right at https://neuralmagic.com/contact/ The traffic is low, I promise! We are very excited to bring these product developments to you! I will cross-reference your other similar inquiry just in case and close out this thread. Feel free to reopen if you have additional questions.

Best, Jeannie // Neural Magic