microsoft / TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications
MIT License
354 stars 31 forks source link

Update dependencies #144

Closed msdmkats closed 3 months ago

msdmkats commented 4 months ago

Update some dependencies, and make dependencies that are not required for slicing optional.

msdmkats commented 4 months ago

We will need to do a full slicing and finetuning run to verify that the dependencies (datasets, transformers, peft) don't lead to issues. I've spent a while debugging why finetuning loss was just Nan, turned out peft versions are not backwards compatible. Pinning to peft=0.6.0 solved the issue. This may be due to the interface changing, so I suggest we do a full check before updating dependencies (not just the tests).

Peft 0.9.0 indeed caused problems. I rolled it back to 0.6.0, but also made optional as it apparently is also not needed for the actual slicing. Other than that, logs for finetuning with parameters from readme for phi-2 sliced with sparsity 0.25 using alpaca for calibration end with

{'loss': 1.384, 'grad_norm': 39.89509582519531, 'learning_rate': 0.00012, 'epoch': 0.17}                                                             
{'eval_loss': 1.1535898447036743, 'eval_runtime': 16.2278, 'eval_samples_per_second': 7.888, 'eval_steps_per_second': 0.986, 'epoch': 0.17}

And lm_eval's score on piqa task for that model is 0.745 both before and after finetuning.