paperswithlove / papers-we-read

3 stars 0 forks source link

AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description #42

Open runhani opened 2 months ago

runhani commented 2 months ago

Some Links

ArXiv : https://arxiv.org/abs/2407.15850 Code : https://github.com/Jyxarthur/AutoAD-Zero TV-AD : https://www.robots.ox.ac.uk/~vgg/research/autoad-zero/#tvad

AD는 뭐예요?

1,2,3 다음에 4가 아니라 zero라니?!

Details

image

얼굴 검출 : RetinaFace 얼굴 인식 : ArcFace (TH : 0.2) 등록 DB : IMDb character bank FR → VideoLLaMA2-7B : colored circle with 8 frames

MAD-Eval : Movie Audio Descriptions CMD-AD : Condensed Movie Dataset TV-AD : TV series (프렌즈, 빅뱅이론 등)

장점

단점

Ablations

image

image

image

기타

Video Understanding 모델 관련 논의해야 하는 내용들

image

AutoAD 관련 논의해야 하는 내용들