Add weights compression to Florence-2 notebook

openvinotoolkit / openvino_notebooks

📚 Jupyter notebook tutorials for OpenVINO™

Apache License 2.0

2.3k stars 791 forks source link

Add weights compression to Florence-2 notebook #2343

Closed l-bat closed 1 week ago

review-notebook-app[bot] commented 2 weeks ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

eaidova commented 1 week ago

@l-bat are you sure that it gives any speedup? Becuase florence2 is small model <1b parameters, I do not think that it can get any benifit from weight compession.

l-bat commented 1 week ago

@eaidova, you're right; the speedup is quite small at 1.07x. However, we can still benefit from compressing the weights to 4 bits: Model	FP16, Mb	U4, Mb	Compression rate
decoder	185	67	2.8
decoder_with_past	172	64	2.7
encoder	83	24	3.5
image_embedding	175	50	3.5
text_embedding	76	38	2

With PTQ, we can achieve a 1.13x speedup, but I think this isn’t sufficient to justify adding quantization in this notebook