raymin0223 / patch-mix_contrastive_learning

Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification (INTERSPEECH 2023)
60 stars 11 forks source link

How to calculate the mean and standard? #8

Closed LWYgut closed 4 months ago

LWYgut commented 4 months ago

Hi, @raymin0223. Thank you for opensource code ! I noticed that your paper includes the phrase “We also applied the standard normalization on the spectrograms with the mean and standard deviation of –4.27 and 4.57, respectively.”However, I cannot obtain the correct values based on the code you provided.I calculated the mean and standard using your code, and they are -9.0943 and 3.5168, respectively.

Here are my calculation steps. Did you calculate the mean and variance this way?

Step 1: Comment out the normalization in the generate_fbank function in the icbhi_util.py file. image

Step 2: Calculate the mean and standard in the set_loader function in the main.py file. image

Thank you very much !

LWYgut commented 4 months ago

Or you can share how you calculate mean and standard ? Looking forward to your reply!

kaen2891 commented 4 months ago

Hi @LWYgut

Because of AST setting, we used the mean and std of -4.27 and 4.57 for data normalization. And this is implemented in here in util/icbhi_util.py where generate_fbank function is linked from here in util/icbhi_dataset.py.

LWYgut commented 4 months ago

Hi @LWYgut

Because of AST setting, we used the mean and std of -4.27 and 4.57 for data normalization. And this is implemented in here in util/icbhi_util.py where generate_fbank function is linked from here in util/icbhi_dataset.py.

Thank you for your patient answer, but I still have the following questions:

Q1: When the pre-training dataset is different from the downstream task dataset, how should normalization be performed?(Should I use the mean and std of the pre-training dataset to normalize the downstream dataset, or should I use the mean and std of the downstream task dataset to normalize the downstream dataset?)

Q2: Are the mean and std parameters (-4.27 and 4.57) applicable to all models(such as resnet VGG CNN......)?

Q3: Are they applicable (-4.27 and 4.57) to all datasets?

Q4: On the basis of Q2 and Q3, if I want to use another dataset or different model , how should I calculate these two values?

Can you answer this question for me? This has been puzzling me for a long time. I would appreciate it. Good luck with your studies !

kaen2891 commented 4 months ago

Hi @LWYgut

Q1 & Q2: Many open-source deep learning models are trained from large-scale datasets, and we can categorize these models in two ways: self-supervised and supervised.

When we exploit self-supervised model for our downstream tasks, we can generally get mean and std from downstream dataset and apply to instance or global normalization.

For the pretrained supervised model, e.g., AST, we could use pretrained model's mean and std if that model already announced them. In AST case, authors recommended to use mean and std as -4.27 and 4.57, thus we followed their recommendation. Otherwise, we can apply mean and std from our data.

Q3: If you plan to using AST with other datasets, my opinion is yes.

Q4: If you plan to not using AST, you can get mean as well as std from your own data, and apply them to each data. We call it instance-level normalization.

LWYgut commented 4 months ago

I would like to express my deep gratitude for your patient response, and I wish you progress in your scientific research! Thank you very much!

raymin0223 commented 2 months ago

Hi @LWYgut,

Though @kaen2891 already gave very good and exact answers, I'll also leave some explanations in my perspective (sorry for the late reply, I checked this issue before tho)

As kaen2891 answered, we used the values of –4.27 and 4.57, which are used for pretraining AST in their original github, because we fine-tuned AST from their pretrained checkpoints.

Regarding Q1, in my knowledge, it'd be better to use the same mean and std values that we exactly used for pretraining. Thus, even though fine-tuning datasets have different normalization values, we have to normalize those datasets with values of pretraining datasets.

Q2, normalization values are determined by datasets, so model architectures do not matter. But as I mentioned, if we start from pretrained checkpoint, it might be better to use their normalization values, unless we have a vast amount of fine-tuning datasets.

Q3, tbh, it will be the best to calculate their own mean and std of each dataset. But it can depend, models are trained from scratch or pretrained checkpoints.

Q4, you can refer to this. We have also tried to use fine-tuning datasets' normalization values, there was no big difference tho. https://github.com/raymin0223/patch-mix_contrastive_learning/blob/ebc40cddc6d90b32dfbe8d60c2b7812d7e1e14f1/util/icbhi_util.py#L64-L82