qiuqiangkong / audioset_tagging_cnn

MIT License
1.35k stars 255 forks source link

Cnn14_DecisionLevelMax for 16kHz audio #58

Open kremnik opened 1 year ago

kremnik commented 1 year ago

Hello. Could you tell, are there aby plans to publish a Sound Event Detection model for 16kHz audio records? I'm trying to use Cnn14_DecisionLevelMax model with parameters:

sample_rate = 16000
window_size = 512
hop_size = 160
mel_bins = 64
fmin = 50
fmax = 8000

But there is an error with window_size parameter, it must be 1024. Such window size is too large for 16kHz audio. Could you tell me please, is there any solution except of training the SED model from scratch on 16kHz audio records?