Closed hbredin closed 1 year ago
Sidenote: we may be able to pretrain a MALE
/FEMALE
/CHILD
/KEYCHILD
model using our internal data, but this depends on the severely restrictive (and rightfully so) nature of the data. This could be a really nice addition to the child language acquisition community.
I can think of several others:
SPEECH
vs. MUSIC
vs. NOISE
with MUSAN
or AVASpeech - SMAD
or TVSM
Sheldon
vs. Leonard
vs. Penny
using Bazinga
(could be used for a demo: which one of The Big Bang Theory character are you?)Note that the new MultiLabelSegmentation
task requires annotations with fine (start time, end time) boundaries. However, neither VoxCeleb nor MUSAN provide this kind of annotations (as one file contains only one class). It means that MultiLabelSegmentation
is probably not really a good choice for this kind of dataset. That is also the reason why I renamed the task to segmentation (rather than detection).
Laughter detection
@hbredin here is dataset, it have overlap annotation also, for quality of annotation i am not sure. It can be used for SPEECH vs. MUSIC vs. NOISE. See If it is useful.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I'd like to pretrain a couple of models to host then on hugginface along with the others. What kind of classes/train datasets would you suggest? I was thinking about
MALE
/FEMALE
using AMI/VoxCeleb, but that's it. Any other ideas?Originally posted by @hadware in https://github.com/pyannote/pyannote-audio/issues/891#issuecomment-1172205394