paperswithlove papers-we-read issues

paperswithlove / papers-we-read

3 stars 0 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

#55 JihoonJ opened 2 days ago
0
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

#54 JihoonJ opened 2 days ago
0
LLaVA-o1: Let Vision Language Models Reason Step-by-Step

#53 JihoonJ opened 3 days ago
0
GPT-4o System Card

#52 runhani opened 3 weeks ago
0
ARIA : An Open Multimodal Native Mixture-of-Experts Model

#51 runhani opened 1 month ago
0
Emu3: Next-Token Prediction is All You Need

#50 runhani opened 1 month ago
0
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

#49 runhani opened 1 month ago
0
ORYX MLLM: On-demand Spatial-Temporal Understanding at Arbitrary resolution

#48 runhani opened 1 month ago
0
Pixtral 12B - the first-ever multimodal Mistral model.

#47 runhani opened 2 months ago
1
Idefics3 : Building and better understanding vision-language models: insights and future directions

#46 runhani opened 2 months ago
0
NVLM: Open Frontier-Class Multimodal LLMs

#45 JihoonJ opened 2 months ago
0
Qwen2-VL : 1D text, 2D arbitrary resolution image , 3D video over 20 minutes video with LM decoder

#44 runhani opened 2 months ago
0
Pegasus-v1 Technical Report

#43 runhani opened 2 months ago
0
AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description

#42 runhani opened 2 months ago
0
The Llama 3 Herd of Models (vision part only)

#41 JihoonJ opened 4 months ago
0
PaliGemma: A versatile 3B VLM for transfer

#40 runhani opened 4 months ago
0
Vision language models are blind

#39 blacklleye opened 4 months ago
0
AutoAD III: The Prequel -- Back to the Pixels

#38 runhani opened 4 months ago
2
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

#37 runhani opened 4 months ago
3
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AG

#36 runhani opened 5 months ago
0
CMMMU: A Chinese Massive Multi-discipline Multimodal Understand- ing Benchmark

#35 runhani opened 5 months ago
0
Extending Context Window of LLMs via Position Interpolation

#34 runhani opened 5 months ago
0
Chameleon: Mixed-Modal Early-Fusion Foundation Models

#33 runhani opened 6 months ago
0
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

#32 runhani opened 6 months ago
0
What matters when building vision-language models?

#31 runhani opened 6 months ago
1
Evaluating Task-based Effectiveness of MLLMs on Charts

#30 soohwan-hyun opened 6 months ago
0
What matters when building vision-language models?

#29 JihoonJ opened 6 months ago
0
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

#28 hjeun opened 6 months ago
0
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

#27 hjeun opened 6 months ago
0
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

#26 JihoonJ opened 6 months ago
0
TextSquare: Scaling up Text-Centric Visual Instruction Tuning

#25 JihoonJ opened 7 months ago
0
Idefics2: A Powerful 8B Vision-Language Model for the community

#24 runhani opened 7 months ago
0
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

#23 runhani opened 7 months ago
0
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

#22 runhani opened 7 months ago
1
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

#21 runhani opened 7 months ago
0
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

#20 runhani opened 7 months ago
0
Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models

#19 hjeun opened 7 months ago
0
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

#18 hjeun opened 7 months ago
0
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models

#17 runhani opened 7 months ago
0
MMStar: Are We on the Right Way for Evaluating Large Vision-Language Models?

#16 JihoonJ opened 7 months ago
0
HPT - Open Multimodal Large Language Models

#15 runhani opened 7 months ago
2
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

#14 blacklleye opened 7 months ago
1
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

#13 hjeun opened 8 months ago
2
Image Captioners Are Scalable Vision Learners Too

#12 paperswithlove opened 8 months ago
0
Unifying Vision, Text, and Layout for Universal Document Processing

#11 paperswithlove opened 8 months ago
0
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

#10 runhani opened 8 months ago
0
Segment and Caption Anything

#9 runhani opened 8 months ago
0
Continual Test-Time Domain Adaptation

#8 runhani opened 8 months ago
0
Efficient Test-Time Model Adaptation without Forgetting

#7 runhani opened 8 months ago
0
When Do We Not Need Larger Vision Models? (from 현준님)

#6 runhani opened 8 months ago
2