Model inference speed is slow

shikiw / OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

MIT License

244 stars 22 forks source link

Model inference speed is slow #3

Closed hlz0606 closed 6 months ago

hlz0606 commented 8 months ago

Is it environmental problem, or is the algorithm itself slow? Are there any recommended parameter settings that balance speed and performance?

shikiw commented 8 months ago

Hi,

The inference speed of OPERA is slower than common decoding methods like Beam Search, mainly due to 1) the extra inference cost on the candidates of each beam hypothesis and 2) the extra cost on retrospection.

For efficiency, it is recommended to enlarge threshold or decrease num_attn_candidates before the generation. For example, threshold=20 and num_attn_candidates=2.

shikiw commented 7 months ago

Hi,

I have done some code optimizations that improve the inference speed by more than 30%