Open HAMZA310 opened 3 years ago
Hi @HAMZA310
Thanks for the suggestion. The proposal sounds good to me. One question regarding the multimodal part; how do you propose to handle video in torchaudio? torchvision
has ffmpeg
binding that, I believe, can handle both audio and video. But torchaudio
does not have such capability at the moment. So I am wondering this could better fit in torchvision
.
cc @fmassa @datumbox @NicolasHug
Hi @mthrok
Thanks for your response. About the multimodal part, I'm proposing to add only the audio stream in torchaudio
similar to this release.
The audio stream in CREMA-D is not dependent on the visual recordings by any means and could be considered a standalone audio dataset.
đ Feature
CREMA-D is an audio-visual data set for emotion recognition coming from a variety of races and ethnicities.
The data set consists of facial and vocal emotional expressions in sentences spoken in a range of basic emotional states (happy, sad, anger, fear, disgust, and neutral).
The dataset consists of 7,442 clips of 91 actors (48 male and 43 female).
torchaudio should contain the audio stream from the original audio-visual recording.
Motivation
Pitch
I have recently used this dataset in PyTorch in one of my projects. Making sure the implementation follows the recent template changes, I'd be happy to open a pull request.
Additional context
This is the reference paper for CREMA-D.