microsoft / Moonlit

This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.
MIT License
73 stars 7 forks source link

Support for Compresso pruned weights removal #45

Open Tyler-Durden-official opened 10 months ago

Tyler-Durden-official commented 10 months ago

currently after merging pruning masks and LoRA weights, LLaMA-7B size is increasing from 15GB to 26GB. Please provide support to remove pruned weights from the model