sunlicai / HiCMAE

[Information Fusion 2024] HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition
MIT License
95 stars 7 forks source link

Two questions about the WereWolf-XL and AVCAffe datasets #10

Closed Yukiyousa closed 2 weeks ago

Yukiyousa commented 2 months ago

Dear authors,

Thank you for your valuable contribution! I have two questions regarding the datasets as follows:

  1. I notice that you provided split files for the WereWolf-XL dataset in the repository. However, the code implementation seems to only focus on discrete emotion recognition, and there is no implementation for continuous emotion recognition on the WereWolf-XL dataset. Could you please guide me or provide some references on how to modify the code to adapt it for continuous emotion recognition on the WereWolf-XL dataset?

  2. I have downloaded the AVCAffe dataset, but I couldn't find the information on how to create labels that match the description in your paper: "Note that the arousal and valence scores are given on a scale of 1-4, and we follow the original paper to formulate their prediction as a classification task instead of a regression one." Could you please tell me how to generate such label files?

Looking forward to your response!

Thank you very much!

sunlicai commented 2 months ago

Hi @Yukiyousa, thanks for your interest in our work.

  1. For adpatation to continuous emotion recognition, you need to modify the loss function (e.g., MSE loss) and evaluation metrics (e.g., MSE, PCC, and CCC).
  2. It's very easy to generate the discrete labels by defining the following maping: label2idx = { 'arousal': {'Excited': 0, 'Wide-awake': 1, 'Neutral': 2, 'Dull': 3, 'Calm': 4}, 'valence': {'Pleasant': 0, 'Pleased': 1, 'Neutral': 2, 'Unsatisfied': 3, 'Unpleasant': 4} }

Hope this answers your questions!

Yukiyousa commented 2 months ago

Thank you for your timely reply.

Yukiyousa commented 2 months ago

Dear @sunlicai, I apologize for bothering you again. I still have two questions regarding the AVCAffe dataset.

In your SVFAP work, the paper mentions: "We follow the paper [AVCAffe] to obtain video-level predictions by averaging clip-level scores and employ the weighted F1-score as the evaluation metric." Could you please help me to confirm whether the results reported in the HiCMAE paper follow the same approach?

If so, could you kindly provide the formal split files for the AVCAffe dataset?

Additionally, could you please guide me on how to modify the code and parameters to adapt the process of obtaining video-level predictions by averaging clip-level scores in the HiCMAE codebase?

Thank you for your assistance and contributions to the research community once again!

Yukiyousa commented 2 months ago

Dear @sunlicai, I apologize for bothering you again. I still have two questions regarding the AVCAffe dataset.

In your SVFAP work, the paper mentions: "We follow the paper [AVCAffe] to obtain video-level predictions by averaging clip-level scores and employ the weighted F1-score as the evaluation metric." Could you please help me to confirm whether the results reported in the HiCMAE paper follow the same approach?

If so, could you kindly provide the formal split files for the AVCAffe dataset?

Additionally, could you please guide me on how to modify the code and parameters to adapt the process of obtaining video-level predictions by averaging clip-level scores in the HiCMAE codebase?

Thank you for your assistance and contributions to the research community once again!

Dear @sunlicai , You could ignore the AVCAffe-related issues I raised earlier as I have mostly resolved the problem.

Currently, I am encountering some challenges with the WereWolf-XL dataset. I am a beginner in the field of affective computing, especially when it comes to continuous emotion recognition.

Could you kindly provide more information on how to adapt continuous emotion recognition to the HiCMAE codebase for the WereWolf-XL dataset?

I have noticed that aside from modifying the loss functions and evaluation metrics, other changes might be required, such as setting the activation function in the classification head.

Thank you once again for your kind assistance and contributions to the research community!

sunlicai commented 2 months ago

For the loss function, you can use mean square error as follows:

criterion = torch.nn.MSELoss()

For evaluation metrics, I provide some examples here:

import numpy as np
from scipy.stats import pearsonr

def cal_mse(output, target):
    """
    :param output: (B, C), numpy array
    :param target: (B, C), numpy array
    :return: scalar
    """
    mse = np.square(output - target).mean()
    return mse

def cal_pcc(output, target):
    """
    :param output: (B, C), numpy array
    :param target: (B, C), numpy array
    :return: scalar
    """
    num_samples, n_dims = output.shape
    if num_samples == 1: # 'pearsonr' will lead to bug 'x and y must have length at least 2.'!
        pccs = [1.0] * n_dims
    else:
        pccs = [pearsonr(output[:,i], target[:,i])[0] for i in range(n_dims)]
    pcc = np.mean(pccs)
    return pcc

def cal_ccc(output, target):
    """
    :param output: (B, C), numpy array
    :param target: (B, C), numpy array
    :return: scalar
    """
    n_dims = output.shape[-1]
    cccs = []
    for i in range(n_dims):
        preds, labels = output[:,i], target[:,i]
        preds_mean, labels_mean = np.mean(preds), np.mean(labels)
        cov_mat = np.cov(preds, labels)  # Note: unbiased
        covariance = cov_mat[0, 1]
        preds_var, labels_var = cov_mat[0, 0], cov_mat[1, 1]

        # pcc = covariance / np.sqrt(preds_var * labels_var)
        ccc = 2.0 * covariance / (preds_var + labels_var + (preds_mean - labels_mean) ** 2)

        cccs.append(ccc)
    ccc = np.mean(cccs)
    return ccc