issues
search
paperswithlove
/
papers-we-read
3
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AG
#36
runhani
opened
1 month ago
0
CMMMU: A Chinese Massive Multi-discipline Multimodal Understand- ing Benchmark
#35
runhani
opened
1 month ago
0
Extending Context Window of LLMs via Position Interpolation
#34
runhani
opened
1 month ago
0
Chameleon: Mixed-Modal Early-Fusion Foundation Models
#33
runhani
opened
1 month ago
0
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
#32
runhani
opened
1 month ago
0
What matters when building vision-language models?
#31
runhani
opened
1 month ago
1
Evaluating Task-based Effectiveness of MLLMs on Charts
#30
soohwan-hyun
opened
1 month ago
0
What matters when building vision-language models?
#29
JihoonJ
opened
1 month ago
0
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
#28
hjeun
opened
2 months ago
0
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
#27
hjeun
opened
2 months ago
0
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
#26
JihoonJ
opened
2 months ago
0
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
#25
JihoonJ
opened
2 months ago
0
Idefics2: A Powerful 8B Vision-Language Model for the community
#24
runhani
opened
2 months ago
0
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
#23
runhani
opened
2 months ago
0
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
#22
runhani
opened
2 months ago
1
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
#21
runhani
opened
2 months ago
0
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
#20
runhani
opened
2 months ago
0
Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models
#19
hjeun
opened
3 months ago
0
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
#18
hjeun
opened
3 months ago
0
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models
#17
runhani
opened
3 months ago
0
MMStar: Are We on the Right Way for Evaluating Large Vision-Language Models?
#16
JihoonJ
opened
3 months ago
0
HPT - Open Multimodal Large Language Models
#15
runhani
opened
3 months ago
2
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
#14
blacklleye
opened
3 months ago
1
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
#13
hjeun
opened
3 months ago
2
Image Captioners Are Scalable Vision Learners Too
#12
paperswithlove
opened
3 months ago
0
Unifying Vision, Text, and Layout for Universal Document Processing
#11
paperswithlove
opened
3 months ago
0
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
#10
runhani
opened
3 months ago
0
Segment and Caption Anything
#9
runhani
opened
3 months ago
0
Continual Test-Time Domain Adaptation
#8
runhani
opened
3 months ago
0
Efficient Test-Time Model Adaptation without Forgetting
#7
runhani
opened
3 months ago
0
When Do We Not Need Larger Vision Models? (from 현준님)
#6
runhani
opened
3 months ago
2
LLAVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
#5
blacklleye
opened
3 months ago
0
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
#4
runhani
opened
3 months ago
7
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
#3
runhani
opened
3 months ago
0
TextMonkey : An OCR-Free Large Multimodal Model for Understanding Document
#2
soohwan-hyun
opened
3 months ago
0
DeepSeek-VL: Towards Real-World Vision-Language Understanding
#1
JihoonJ
opened
3 months ago
2