wenet-e2e / wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Apache License 2.0
707 stars 116 forks source link

8000 hz data #129

Closed EmreOzkose closed 1 year ago

EmreOzkose commented 1 year ago

Hi,

Do you have any advice or experience for training 8000 hz data? Is it enough to change sampling rate in config, or should I decrease mel-size from 80 to 40 ?

JiJiJiang commented 1 year ago

Hello, set resample_rate as 8000 and the training is ready. All data would be down-sampled into 8k HZ in an online manner while training, whatever 80-dim or 40-dim features are used.

EmreOzkose commented 1 year ago

Thank you @JiJiJiang. Should I decrease num_frame from 600 to 300 to do a meaningful training?

EmreOzkose commented 1 year ago

Or should I increase learning rate by 3 times, if I have 3 x #samples-in-voxceleb2-dev ? Do you have any experiments with other datasets ?

JiJiJiang commented 1 year ago

No, you shouldn't. The training sample num per wave is calculated automatically according to num_frms and resample_rate.

JiJiJiang commented 1 year ago

Or should I increase learning rate by 3 times, if I have 3 x #samples-in-voxceleb2-dev ? Do you have any experiments with other datasets ?

Joint training with other datasets should be fine. My suggestion is keeping the same lr and use a smaller epoch num (e.g., 150 => 120 or 100)