paperswithlove / papers-we-read

3 stars 0 forks source link

Pegasus-v1 Technical Report #43

Open runhani opened 2 months ago

runhani commented 2 months ago

Some Links

Pegasus-1 17B

Video LLM (with video encoder Marengo 2.6)

image

  1. Video Encoder Model : Video Frames, Video ASR (text)
  2. Video-Language Alignment : video embeddings을 llm에 align 시켜주는 모델
  3. LLM : Transformer Decoder

그럼 Marengo가 뭘까?

Marengo 2.6

Multimodal Foundation Model for any-to-any search

image

Spec

입력 최소 : 4 seconds 입력 최대 : 20 minutes

학습

Videos : 60M (6천만) Image : 500M (5억) Audio : 500k (50만) non-verbal sounds and music

평가

image

baseline

Zero Shot Image Retrieval (ZS-T2I)

Zero Shot Image Retrieval MS-COCO Recall@1 MS-COCO Recall@5 Flickr30K R@1 Flickr30K R@5
Apple DFN-H/378 (2024.01) 55.6% 79.2% 82.1% 96.0%
Gemini Multimodal Embedding API (2024.02) 52.73% 75.80% 80.26% 94.28%
Ours (Marengo-2.6) 55.65% (-) 80.31% (+1.1%) 84.95% (+2.9%) 96.7% (+0.7%)

Zero Shot Audio Retrieval (ZS-T2A)

Zero Shot Audio Retrieval Clotho R@1 Clotho R@10 AudioCaps R@1 AudioCaps R@10
Peking Univ. LanguageBind-H (2024.01) 16.7% 52.0% 19.7% 67.6%
Ours (Marengo-2.6) 17.61% (+0.9%) 52.25% (+0.3%) 23.01% (+3.3%) 69.43% (+1.8%)

그래 encoder를 잘 만들었다고 하면? 그 다음은?

평가

image

Video Question Answering

ActivityNet-QA Test Split (%) NExT-QA Test Split (%)
Video-ChatGPT 35.2 -
VideoChat2 49.1 61.7
Gemini 1.0 Pro 49.8 28.0
Gemini 1.0 Ultra 52.2 29.9
Gemini 1.5 Pro 56.7 -
Pegasus-1 59.9 71.1

Video Conversations

Correctness of Information Detailed Orientation Contextual Understanding Temporal Understanding Consistency Average
Video-ChatGPT 2.40 2.52 2.62 1.98 2.37 2.38
VideoChat2 3.02 2.88 3.51 2.66 2.81 2.98
Gemini 1.0 Pro 2.98 2.99 3.44 2.32 2.32 2.81
Pegasus-1 3.79 3.76 4.29 3.34 4.03 3.84

Video Summarization

Correctness of Information Detailed Orientation Contextual Understanding Average
Vendor A 0.73 0.80 0.91 0.81
Whisper + ChatGPT-3.5 0.49 0.79 0.68 0.65
Video-ChatGPT 1.19 1.33 1.42 1.31
VideoChat2 1.78 1.52 1.98 1.76
Gemini 1.0 Pro 1.65 1.69 1.94 1.76
Pegasus-1 2.30 2.58 2.75 2.54