microsoft / TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications
MIT License
354 stars 31 forks source link

Add QuaRot (no quantization yet) #142

Closed nailimixaM closed 4 months ago

nailimixaM commented 4 months ago

This PR adds the minimum required to apply QuaRot to a Llama-2 7b model to be act-and-weight quantizeable, without actually doing any quantization with rtn/gptq.

nailimixaM commented 4 months ago

@jameshensman I pushed our fixes from last week (refactoring hadamard) which fixed the PR build, let me know what you think and if it's all good I'll merge into quarot main.

jameshensman commented 4 months ago

Approved.

On Wed, 1 May 2024 at 14:03, Maximilian Croci @.***> wrote:

@jameshensman https://github.com/jameshensman I pushed our fixes from last week (refactoring hadamard) which fixed the PR build, let me know what you think and if it's all good I'll merge into quarot main.

— Reply to this email directly, view it on GitHub https://github.com/microsoft/TransformerCompression/pull/142#issuecomment-2088432846, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABFWD3TC3NLJK4M5AGDGRTZADRZRAVCNFSM6AAAAABGY6PZXSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBYGQZTEOBUGY . You are receiving this because you were mentioned.Message ID: @.***>