Huber-Emotion_Detection is a project that utilizes the Hugging Face Transformers library to perform audio emotion detection. The main objective of this project is to classify audio emotions into six different categories using the "facebook/hubert-base-ls960" model. The dataset used for training and evaluation is the Shemo Persian Speech Emotion Detection Database.
Make sure you have Python installed, then install the required packages using the following command:
pip install transformers
pip install librosa
git clone https://github.com/your-username/Huber-Emotion_Detection.git
cd Huber-Emotion_Detection
Download the dataset from here and place it in the appropriate directory ('dataset/archive/').
Run the scripts to preprocess the data and train the model:
python train.py
# Load the trained model
import torch
import torch.nn as nn
from transformers import AutoConfig, AutoTokenizer, AutoModel
from transformers import Wav2Vec2FeatureExtractor
device = "cuda"
embedding = Wav2Vec2FeatureExtractor.from_pretrained('facebook/wav2vec2-base-960h')
model_name = "facebook/hubert-base-ls960"
# Downlaoding ~350 Mb
HuBERT = AutoModel.from_pretrained(model_name,output_hidden_states= True).to(device)
classifier = nn.Sequential(nn.Dropout(0.5),
nn.Linear(768,128),
nn.ReLU(),
nn.Linear(128,6)).to(device)
# Load an audio file and convert it to text
audio_text = "..."
audio_text = torch.rand(size=[1000,1000],device=device)
# Perform emotion classification
with torch.no_grad():
outputs = embedding(audio_text,sampling_rate=16000, return_tensors='pt').input_values.to(device)
outputs = HuBERT(outputs.squeeze(0))
outputs = outputs.last_hidden_state.mean(dim=1)
outputs = classifier(outputs)
print(outputs)
Contributions to the project are welcome! If you want to contribute, please follow these steps: