Useful multi-label models - Githubissues

pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

http://pyannote.github.io

MIT License

6.28k stars 776 forks source link

Useful multi-label models #1027

Closed hbredin closed 1 year ago

hbredin commented 2 years ago

I'd like to pretrain a couple of models to host then on hugginface along with the others. What kind of classes/train datasets would you suggest? I was thinking about MALE/FEMALE using AMI/VoxCeleb, but that's it. Any other ideas?

Originally posted by @hadware in https://github.com/pyannote/pyannote-audio/issues/891#issuecomment-1172205394

hadware commented 2 years ago

Sidenote: we may be able to pretrain a MALE/FEMALE/CHILD/KEYCHILD model using our internal data, but this depends on the severely restrictive (and rightfully so) nature of the data. This could be a really nice addition to the child language acquisition community.

hbredin commented 2 years ago

I can think of several others:

SPEECH vs. MUSIC vs. NOISE with MUSAN or AVASpeech - SMAD or TVSM
Stuttering event detection with Sep-28k
Sheldon vs. Leonard vs. Penny using Bazinga (could be used for a demo: which one of The Big Bang Theory character are you?)

Note that the new MultiLabelSegmentation task requires annotations with fine (start time, end time) boundaries. However, neither VoxCeleb nor MUSAN provide this kind of annotations (as one file contains only one class). It means that MultiLabelSegmentation is probably not really a good choice for this kind of dataset. That is also the reason why I renamed the task to segmentation (rather than detection).

hbredin commented 2 years ago

sound event detection with AudioSet (and its strong labels)
VGGSound

hbredin commented 2 years ago

Laughter detection

manish-kumar-iisc commented 2 years ago

@hbredin here is dataset, it have overlap annotation also, for quality of annotation i am not sure. It can be used for SPEECH vs. MUSIC vs. NOISE. See If it is useful.

hbredin commented 2 years ago

hbredin commented 2 years ago

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

hbredin commented 1 year ago

hbredin commented 1 year ago

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.