Closed dicksensei69 closed 6 months ago
int4 version just dropped: https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B-int4/tee/main
I plan to add this model, but unfortunately, I will be too busy for the next week or so.
It seems to be only available for Linux.
How strange, thought they all ran on transformers. I personally have a Linux system so it won't be a problem for me but I understand if you don't want to implement anymore. It still looks like a powerful model, vision size and llama3. Thanks for your efforts, I really like taggui :)
It does use Transformers, but it's not officially integrated into the library, so it contains custom code and dependencies.
I am still working on adding it for Linux users.
The recently launched CogVLM2 model series from THUDM offers significant improvements in image understanding and captioning capabilities. With its support for longer text inputs (up to 8K tokens) and higher image resolutions (up to 1344x1344 pixels), CogVLM2 could greatly enhance Taggui's automatic caption and tag generation feature.
Benefits:
Improved accuracy and quality of automatically generated captions and tags, leveraging CogVLM2's advanced image understanding capabilities. Support for higher image resolutions, accommodating a wider range of use cases. Enhanced user experience by providing access to the latest advancements in vision-language models within Taggui's familiar interface.