microsoft / TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications
MIT License
354 stars 31 forks source link

Make sliced models HuggingFace compatible #139

Open LianaMikael opened 4 months ago

LianaMikael commented 4 months ago

This PR adds the implementations for sliced Phi and Llama models to make it easy to save and load sliced models. The models can be initialized with a given scheduler (or no scheduler for zero sparsity) and support save_pretrained and from_pretrained methods like standard HF models.

pashminacameron commented 4 months ago

I would like to hold off merging this into main for a bit. I will work on this more (after the smaller fixes) and we can re-review.

canamika27 commented 4 months ago

I would like to hold off merging this into main for a bit. I will work on this more (after the smaller fixes) and we can re-review.

Any update by when the changes will be merged

LianaMikael commented 3 months ago

I would like to hold off merging this into main for a bit. I will work on this more (after the smaller fixes) and we can re-review.

Any update by when the changes will be merged

We will work on merging these changes by the end of this week