openvinotoolkit / openvino_notebooks

📚 Jupyter notebook tutorials for OpenVINO™
Apache License 2.0
2.3k stars 791 forks source link

Add weights compression to Florence-2 notebook #2343

Closed l-bat closed 1 week ago

review-notebook-app[bot] commented 2 weeks ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

eaidova commented 1 week ago

@l-bat are you sure that it gives any speedup? Becuase florence2 is small model <1b parameters, I do not think that it can get any benifit from weight compession.

l-bat commented 1 week ago
@eaidova, you're right; the speedup is quite small at 1.07x. However, we can still benefit from compressing the weights to 4 bits: Model FP16, Mb U4, Mb Compression rate
decoder 185 67 2.8
decoder_with_past 172 64 2.7
encoder 83 24 3.5
image_embedding 175 50 3.5
text_embedding 76 38 2

With PTQ, we can achieve a 1.13x speedup, but I think this isn’t sufficient to justify adding quantization in this notebook