Open K2triinK opened 2 years ago
Hey @K2triinK
I am wrapping up this PR which answers some part of your questions, such as the model size reduction. Regarding the kernels, we are working on a plan to release it soon so that you can give it a try. Thanks, Reza
Describe the bug I was expecting a compressed & faster BERT model after running the BERT ZeroQuant example in DeepSpeedExamples. However, the clean model isn't any smaller (still 417.7 MB) or faster (in fact, it's slower) than the original.
To Reproduce
Expected behavior I expected the final clean model to be a compressed version of the original one, thus being smaller & faster but it isn't.
ds_report output
System info (please complete the following information):