pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.28k stars 776 forks source link

Useful multi-label models #1027

Closed hbredin closed 1 year ago

hbredin commented 2 years ago

I'd like to pretrain a couple of models to host then on hugginface along with the others. What kind of classes/train datasets would you suggest? I was thinking about MALE/FEMALE using AMI/VoxCeleb, but that's it. Any other ideas?

Originally posted by @hadware in https://github.com/pyannote/pyannote-audio/issues/891#issuecomment-1172205394

hadware commented 2 years ago

Sidenote: we may be able to pretrain a MALE/FEMALE/CHILD/KEYCHILD model using our internal data, but this depends on the severely restrictive (and rightfully so) nature of the data. This could be a really nice addition to the child language acquisition community.

hbredin commented 2 years ago

I can think of several others:

Note that the new MultiLabelSegmentation task requires annotations with fine (start time, end time) boundaries. However, neither VoxCeleb nor MUSAN provide this kind of annotations (as one file contains only one class). It means that MultiLabelSegmentation is probably not really a good choice for this kind of dataset. That is also the reason why I renamed the task to segmentation (rather than detection).

hbredin commented 2 years ago
hbredin commented 2 years ago

Laughter detection

manish-kumar-iisc commented 2 years ago

@hbredin here is dataset, it have overlap annotation also, for quality of annotation i am not sure. It can be used for SPEECH vs. MUSIC vs. NOISE. See If it is useful.

hbredin commented 2 years ago

PodcastFillers

image
hbredin commented 2 years ago

LibriStutter

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

hbredin commented 1 year ago

VocalSound

hbredin commented 1 year ago

EpicSounds

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.