raymin0223 / patch-mix_contrastive_learning

Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification (INTERSPEECH 2023)
60 stars 11 forks source link
audio-spectrogram-transformer contrastive-learning icbhi-dataset patch-mix respiratory-sounds

Patch-Mix Contrastive Learning (INTERSPEECH 2023)

arXiv | BibTeX

Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification
Sangmin Bae*, June-Woo Kim*, Won-Yang Cho, Hyerim Baek, Soyoun Son, Byungjo Lee, Changwan Ha, Kyongpil Tae, Sungnyun Kim$^\dagger$, Se-Young Yun$^\dagger$
* equal contribution    $^\dagger$ corresponding authors

Requirements

Install the necessary packages with:

$ pip install torch torchvision torchaudio
$ pip install -r requirements.txt

Data Preparation

Download the ICBHI dataset files from official_page.

$ wget https://bhichallenge.med.auth.gr/sites/default/files/ICBHI_final_database/ICBHI_final_database.zip

All *.wav and *.txt should be saved in data/icbhi_dataset/audio_test_data.

Note that ICBHI dataset consists of a total of 6,898 respiratory cycles, of which 1,864 contain crackles, 886 contain wheezes, and 506 contain both crackles and wheezes, in 920 annotated audio samples from 126 subjects.

Training

To simply train the model, run the shell files in scripts/.

  1. scripts/icbhi_ce.sh: Cross-Entropy loss with AST model.
  2. scripts/icbhi_patchmix_ce.sh: Patch-Mix loss with AST model, where the label depends on the interpolation ratio.
  3. scripts/icbhi_patchmix_cl.sh: Patch-Mix contrastive loss with AST model.

Important arguments for different data settings.

Important arguments for models.

Important arugment for evaluation.

The pretrained model checkpoints will be saved at save/[EXP_NAME]/best.pth.

Result

Patch-Mix Contrastive Learning achieves the state-of-the-art performance of 62.37%, which is higher than previous Score by +4.08%.

BibTeX

If you find this repo useful for your research, please consider citing our paper:

@article{bae2023patch,
  title={Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification},
  author={Bae, Sangmin and Kim, June-Woo and Cho, Won-Yang and Baek, Hyerim and Son, Soyoun and Lee, Byungjo and Ha, Changwan and Tae, Kyongpil and Kim, Sungnyun and Yun, Se-Young},
  journal={arXiv preprint arXiv:2305.14032},
  year={2023}
}

Contact