irthomasthomas / undecidability

3 stars 2 forks source link

DeepSeek-VL: Towards Real-World Vision-Language Understanding #726

Open irthomasthomas opened 4 months ago

irthomasthomas commented 4 months ago

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Abstract

We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions:

The DeepSeek-VL family (both 1.3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks. We have made both 1.3B and 7B models publicly accessible to foster innovations based on this foundation model.

URL: DeepSeek-VL: Towards Real-World Vision-Language Understanding

Suggested labels

irthomasthomas commented 4 months ago

Related content

184

Similarity score: 0.9

722

Similarity score: 0.89

628

Similarity score: 0.88

554

Similarity score: 0.87

383

Similarity score: 0.87

189

Similarity score: 0.87