seungheondoh / lp-music-caps

LP-MusicCaps: LLM-Based Pseudo Music Captioning [ISMIR23]
https://huggingface.co/papers/2307.16372
248 stars 30 forks source link

Can you only handle 10s? Can you handle 30s, 40s, 60s? #13

Open bocaidoufuyushiren opened 3 days ago

seungheondoh commented 2 days ago

Hi @bocaidoufuyushiren, If you want to perform inference on longer audio with the current model, one way is to divide the dataset into 10-second segments and then perform inference on each segment.