PAM is a no-reference metric for assessing audio quality for different audio processing tasks. It prompts Audio-Language Models (ALMs) using an antonym prompt strategy to calculate an audio quality score. It does not require reference data or task-specific models and correlates well with human perception.
[Jul 24] Improved human correlation across tasks [commit]
[Mar 24] PAM is accepted at INTERSPEECH 2024
Open the Anaconda terminal and run:
> git clone https://github.com/soham97/PAM.git
> cd PAM
> conda create -n pam python=3.10
> conda activate pam
> pip install -r requirements.txt
To compute PAM on folder containing audio files, you can directly run:
> python run.py --folder {folder_path}
The symbol {..}
indicates user input.
To compute PAM on heirarchy of folder or multiple directory, we recommed creating a custom dataset.
dataset.py
creating a custom dataset by inheriting from AudioDataset
, similar to ExampleDataset
get_filelist
function to fit to your directory structurerun.py
with your custom dataset and make changes to evaluation if neededThe manuscript uses data from multiple sources. It can be obtained as follows:
This section covers reproducing numbers for text-to-audio and text-to-music. First download the human listening test data by following the instruction listed above. The download should contain a folder titled human_eval
.
Then run the following commands.
> python pcc.py --folder {folder_path}
where {folder_path}
points to human_eval
folder.
@article{deshmukh2024pam,
title={PAM: Prompting Audio-Language Models for Audio Quality Assessment},
author={Soham Deshmukh and Dareen Alharthi and Benjamin Elizalde and Hannes Gamper and Mahmoud Al Ismail and Rita Singh and Bhiksha Raj and Huaming Wang},
journal={arXiv preprint arXiv:2402.00282},
year={2023}
}