ilyassmoummad / scl_icbhi2017

PyTorch implementation of our work: Pretraining Respiratory Sound Representations using Metadata and Contrastive Learning (WASPAA 2023)
27 stars 6 forks source link

The update for M-SCL and SPRSound dataset #9

Closed chumingqian closed 1 year ago

chumingqian commented 1 year ago

Hi, @ilyassmoummad : Have you realdy update for M-SCL and SPRSound dataset, i didn't see that, Thanks in advance.

ilyassmoummad commented 1 year ago

Hi @chumingqian , Thank you for your issue. I just added M-SCL and SPRSound dataset to the code. I also updated Arxiv, It should be available in the upcoming days (I will reupdate the readme). I will keep this issue open for few days, let me know here if you have any error/issue with the updated code. Thanks a lot !

daisukelab commented 1 year ago

Hi @ilyassmoummad , Thank you for your update on SPRSound; however, you may have one more update on ce.py and mscl.py w.r.t. metric calculation. Specifically, the calculation of TP/GT in the training loops seems to be fixed to 4-class labels.

    TP = [0, 0, 0 ,0]
    GT = [0, 0, 0, 0]
     :
    for idx in range(len(TP)):
        TP[idx] += torch.logical_and((labels_predicted==idx),(target==idx)).sum().item()
        GT[idx] += (target==idx).sum().item()

Then, have you made the TP/GT to accommodate 7 classes? I think of the following fix.

def train_epoch(model, train_loader, train_transform, criterion, optimizer, scheduler, n_classes, K=1):
    TP = [0 for _ in range(n_classes)]
    GT = [0 for _ in range(n_classes)]

With this fix, the score becomes slightly lower, about 0.5, in my tests. So, it won't affect your conclusion, but you might want to confirm before WASPAA.

Lastly, thank you very much for sharing your code that made some people's lives easier.

(I have used it in my journal paper under review that focuses on a pre-training method. And as you might know, this Interspeech 2023 paper would also be based on yours: https://arxiv.org/abs/2305.14032)

ilyassmoummad commented 1 year ago

Hi @daisukelab Thank you for noticing and pointing that out. I will fix this next week.

I haven't used this code to train my system, as I have another version with wandb and other overwhelming librairies, but I remember that in my original code I have indeed changed the list to include 7 SPRSound classes.

I will fix the code and re-run the experiments to confirm this and let you know next week ! (Sorry that I can not do it sooner). I am happy that you found my work helpful, and I am also grateful that people like you share their code, your work BYOL-A is an inspiration to us all, thank you for that🙏

daisukelab commented 1 year ago

Hi @ilyassmoummad

Please feel free to take the time to fix this. People may be using your code for some time in the future, so it would be very helpful for us to have you fix it. And I'm not in a hurry either, so it's not a problem.

Also, I am honored that you said that about our BYOL-A, and I hope you enjoy your presentation at WASPAA!

ilyassmoummad commented 1 year ago

Hi @daisukelab ,

Sorry for taking long, I fixed the code as you suggested and I updated the readme. One additional argument compared to ICBHI experiments is '--mode' that takes either 'inter' or 'intra' values to specify which splits to use (SPRSound have inter-patient and intra-patient splits).

I have read the Patch-Mix (interspeech) paper and also your paper with Bioxin Liu (dcase 2023), it is really good to see that transformers achieve new state-of-the-art for this task !

I did rerun the experiments for SPRSound, I got the same results as the ones in the paper, I will keep this issue open until I get your approval. A quick remark is that SPRSound is an easy dataset compared to ICBHI, its data is recorded using the same stethoscope and come from the same medical center unlike ICBHI. The scores should be high (>80%) from the first five epochs using scl or mscl on both inter and intra splits.

Thank you so much for your contributions and your support.

daisukelab commented 1 year ago

Hi @ilyassmoummad ,

Thanks for your response. I understand that only inter-patient is used for --mode, and yes, "intra" seems less meaningful since we cannot test the generalizability under the "intra" setting.

Yes, the transformer models perform better than PANN's CNN. I have also observed that the transformer overfits without pre-training and performs poorly (e.g., about 44% Score). Hence, this situation makes it a suitable subject for my pre-training studies.

P.S. The DCASE paper was written by a student (Mr. Liu), and I will be able to show you another result soon (I am waiting with my fingers crossed for the ongoing journal review).

Also, thanks for the information about SPRSound. It is quite a different nature of data. I would play with it and see how effective the pre-training would be.

I appreciate your support as well. Please feel free to close this ticket.

ilyassmoummad commented 1 year ago

Hi @daisukelab ,

Thank you for the details about your experiments, It's good to note that. I hope your new paper gets accepted so that we can all benefit from your findings and expertise.

I wish you all the success for your upcoming research, and I hope that we cross paths again, It was a big pleasure to have this conversation with you.