pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.55k stars 657 forks source link

Add "Crowd-sourced Emotional Multimodal Actors CREMA-D" Dataset #1268

Open HAMZA310 opened 3 years ago

HAMZA310 commented 3 years ago

🚀 Feature

Motivation

Pitch

I have recently used this dataset in PyTorch in one of my projects. Making sure the implementation follows the recent template changes, I'd be happy to open a pull request.

Additional context

This is the reference paper for CREMA-D.

mthrok commented 3 years ago

Hi @HAMZA310

Thanks for the suggestion. The proposal sounds good to me. One question regarding the multimodal part; how do you propose to handle video in torchaudio? torchvision has ffmpeg binding that, I believe, can handle both audio and video. But torchaudio does not have such capability at the moment. So I am wondering this could better fit in torchvision.

cc @fmassa @datumbox @NicolasHug

HAMZA310 commented 3 years ago

Hi @mthrok

Thanks for your response. About the multimodal part, I'm proposing to add only the audio stream in torchaudio similar to this release. The audio stream in CREMA-D is not dependent on the visual recordings by any means and could be considered a standalone audio dataset.