GLM-4V-9B is an open source multimodal version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. GLM-4V-9B has the ability to conduct multi-round conversations in Chinese and English at a high resolution of 1120 * 1120. In multimodal evaluations of comprehensive Chinese and English abilities, perceptual reasoning, text recognition, and chart understanding, GLM-4V-9B has shown superior performance over GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
Motivation
Multimodal Capabilities: GLM-4V-9B can process and understand both text and images, making it suitable for tasks like image captioning, visual question answering, and more.
High Resolution: Supports high-resolution inputs up to 1120x1120 pixels.
Multilingual Support: Capable of conducting multi-round conversations in both Chinese and English.
Performance: Demonstrates superior performance in various benchmarks, outperforming models like GPT-4-turbo, Gemini 1.0 Pro, and Claude 3 Opus.
Feature Name
GLM-4V
Feature Description
GLM-4V-9B is an open source multimodal version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. GLM-4V-9B has the ability to conduct multi-round conversations in Chinese and English at a high resolution of 1120 * 1120. In multimodal evaluations of comprehensive Chinese and English abilities, perceptual reasoning, text recognition, and chart understanding, GLM-4V-9B has shown superior performance over GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
Motivation
Potential Solutions
https://open.bigmodel.cn/dev/api/normal-model/glm-4
Additional Context (optional)
No response
Affected Areas
None
Priority
Low
Required Files