microsoft / TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications
MIT License
354 stars 31 forks source link

Add Llama3 support to llama_adapter #147

Closed radhikamp99 closed 3 months ago

radhikamp99 commented 3 months ago

Adding support for llama 3 models via the existing llama model adapter - there are no architectural changes since llama 2

Models now supported, and added to the README:

Ran test_model_adapter.py and all tests passed. Ran slice gpt and finetuning experiments and evaluated to get the following results:

model: Meta-Llama-3-8B
piqa: originial: 0.8079, sliced@25% 0.5871, recovery finetuned: 0.6817

I was unable test out the 70B models due to memory constraints.

radhikamp99 commented 3 months ago

Brilliant, thanks so much for this Radhika! Two minor requests:

  • If you could update the README to say we now support these models too, that would be great
  • Could you run a slicing + finetuning experiment and report back numbers, like Pashmina did in her Phi-3 PR?

I was mid-way editing the PR description, apologies if it was confusing - currently running the experiments and will add the numbers to the PR and mark as ready for review when done. I've added the list of supported models to the README, is there something else we need to add there?

radhikamp99 commented 3 months ago

@nailimixaM Added the piqa results above!

pashminacameron commented 3 months ago

Suggest changing the title of the PR to "Add Llama3 support to llama_adapter"

nailimixaM commented 3 months ago

@nailimixaM Added the piqa results above!

Thanks, is this with wikitext or alpaca for slicing and finetuning?

radhikamp99 commented 3 months ago

@nailimixaM Added the piqa results above!

Thanks, is this with wikitext or alpaca for slicing and finetuning?

I used the default set-up, so would be wikitext2