Public child-adult speaker diarization or classification model and code with simulated conversations
git clone https://github.com/usc-sail/child-adult-diarization.git
cd child-adult-diarization/whisper-modeling
pip install -r requirements.txt
from models.whisper import WhisperWrapper
import torch
model = WhisperWrapper()
model.backbone_model.encoder.embed_positions = model.backbone_model.encoder.embed_positions.from_pretrained(model.embed_positions[:500]) model.load_state_dict(torch.load("path/to/whisper-base_rank8_pretrained_50k.pt")) model.cuda() test_data = torch.zeros([1, 160000]).cuda() output = model.forward_eval(test_data)
5. An example code to map the frame-level outputs to child, adult, and overlap timestamps:
```python
from scripts.convert_output import get_timestamps, majority_filter
output = majority_filter(output)
output = get_timestamps(output)
@article{xu2024data,
title={Data Efficient Child-Adult Speaker Diarization with Simulated Conversations},
author={Anfeng Xu and Tiantian Feng and Helen Tager-Flusberg and Catherine Lord and Shrikanth Narayanan},
year={2024},
journal={arXiv preprint arXiv:2409.08881},
url={https://arxiv.org/abs/2409.08881},
}
@inproceedings{xu24c_interspeech,
title = {Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions},
author = {Anfeng Xu and Kevin Huang and Tiantian Feng and Lue Shen and Helen Tager-Flusberg and Shrikanth Narayanan},
year = {2024},
booktitle = {Interspeech 2024},
pages = {5193--5197},
doi = {10.21437/Interspeech.2024-717},
issn = {2958-1796},
}
Please raise an issue or contact anfengxu@usc.edu for any questions.