Pretraining style training aware sparsification

neuralmagic / sparsify

ML model optimization product to accelerate inference.

Apache License 2.0

317 stars 28 forks source link

Is your feature request related to a problem? Please describe. I would like to try sparsifying several pretrained LLMs (e.g. Mistral 7b, Stable LM 3b etc). I have created a pretraining corpus (for causal LLMs) on topics I care about. The corpus is relatively small in terms of LLM pretraining, around 10b tokens, but is gigantic in terms of fine tuning. It seems such a corpus would be ideal for trying this out: https://github.com/neuralmagic/sparsify/blob/main/docs/training-aware-experiment-guide.md.

Describe the solution you'd like Reading through the experiment guide, I cannot identify an appropriate dataset for causal pretraining data. Would appreciate some pointers on what I can try (let me be your guinea pig!).

Hi @kmn1024,

Sorry for the delayed response here. Currently, Sparsify doesn't have the best support for LLMs aside from One-Shot. So I might start there: https://github.com/neuralmagic/sparsify/blob/main/docs/one-shot-experiment-guide.md.

We're also working on landing many more LLM pathways shortly in SparseML and have been working on lots of new LLM research (you can check out our most recent blog here: https://neuralmagic.com/blog/navigating-the-nuances-of-text-generation-how-to-control-llm-outputs-with-deepsparse/). I hope this helps you get started on the right track! Feel free to also join our slack community and dm me if you have any further questions @ rob greenberg.

neuralmagic / sparsify

Pretraining style training aware sparsification #285