swarmauri / swarmauri-sdk

a modular multimodal framework for ai applications
https://swarmauri.com
Apache License 2.0
70 stars 41 forks source link

[Feature Request]: GLM-4V #762

Open MichaelDecent opened 1 week ago

MichaelDecent commented 1 week ago

Feature Name

GLM-4V

Feature Description

GLM-4V-9B is an open source multimodal version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. GLM-4V-9B has the ability to conduct multi-round conversations in Chinese and English at a high resolution of 1120 * 1120. In multimodal evaluations of comprehensive Chinese and English abilities, perceptual reasoning, text recognition, and chart understanding, GLM-4V-9B has shown superior performance over GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.

Motivation

Potential Solutions

https://open.bigmodel.cn/dev/api/normal-model/glm-4

Additional Context (optional)

No response

Affected Areas

None

Priority

Low

Required Files