An extra idea which may be feasible or unfeasible (I do not know) is maybe speculative decoding using a smaller model like this. https://arxiv.org/abs/2310.07177
My experience with speculative decoding in LLMs at least is that it greatly speeds up inference time, and perhaps doing the same thing with like cogvlm as a main model and moondream as a speculative decoding model could speed up captioning of large datasets.
Please add support for this model. https://github.com/vikhyat/moondream
An extra idea which may be feasible or unfeasible (I do not know) is maybe speculative decoding using a smaller model like this. https://arxiv.org/abs/2310.07177
My experience with speculative decoding in LLMs at least is that it greatly speeds up inference time, and perhaps doing the same thing with like cogvlm as a main model and moondream as a speculative decoding model could speed up captioning of large datasets.