Can I use AV-Align to asses video-to-audio generation?

guyyariv / TempoTokens

This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

MIT License

101 stars 10 forks source link

Closed BingliangLi closed 1 month ago

BingliangLi commented 1 month ago

Hi, I thinks the av-align score is a brilliant idea, I would like to ask does it make sense to asses V2A instead of A2V?

guyyariv commented 1 month ago

Hey, thank you! That makes sense to me, and a recent study has already addressed this (see https://arxiv.org/abs/2407.07464).

BingliangLi commented 1 month ago

Thanks!