soham97 / PAM

PAM is a no-reference audio quality metric for audio generation tasks
MIT License
47 stars 5 forks source link

PAM: Prompting Audio-Language Models for Audio Quality Assessment

[Paper] [data]

PAM is a no-reference metric for assessing audio quality for different audio processing tasks. It prompts Audio-Language Models (ALMs) using an antonym prompt strategy to calculate an audio quality score. It does not require reference data or task-specific models and correlates well with human perception. PAM_9 (1)

News

[Jul 24] Improved human correlation across tasks [commit]
[Mar 24] PAM is accepted at INTERSPEECH 2024

Setup

Open the Anaconda terminal and run:

> git clone https://github.com/soham97/PAM.git
> cd PAM 
> conda create -n pam python=3.10
> conda activate pam
> pip install -r requirements.txt

Compute PAM

Folder evaluation

To compute PAM on folder containing audio files, you can directly run:

> python run.py --folder {folder_path}

The symbol {..} indicates user input.

Custom evaluation

To compute PAM on heirarchy of folder or multiple directory, we recommed creating a custom dataset.

Data

The manuscript uses data from multiple sources. It can be obtained as follows:

Paper reproduction

This section covers reproducing numbers for text-to-audio and text-to-music. First download the human listening test data by following the instruction listed above. The download should contain a folder titled human_eval.

Then run the following commands.

> python pcc.py --folder {folder_path}

where {folder_path} points to human_eval folder.

Citation

@article{deshmukh2024pam,
  title={PAM: Prompting Audio-Language Models for Audio Quality Assessment},
  author={Soham Deshmukh and Dareen Alharthi and Benjamin Elizalde and Hannes Gamper and Mahmoud Al Ismail and Rita Singh and Bhiksha Raj and Huaming Wang},
  journal={arXiv preprint arXiv:2402.00282},
  year={2023}
}