sharathadavanne / sed-crnn

Single and multichannel sound event detection using convolutional recurrent neural networks. DCASE 2017 real-life sound event detection winning method.
Other
184 stars 45 forks source link

RNN output shape #1

Closed weiwei-ww closed 5 years ago

weiwei-ww commented 5 years ago

Hi,

I got a little confused of the figure in readme. It says the output shape of RNN is 25664, but should it be 25632 as the shape of the last RNN layer is 32?

sharathadavanne commented 5 years ago

Hi @weiwei-ww thanks for checking this repository. The output is 256x64 because the RNN used is a bidirectional one. So it returns 256x32 dimension each for forward and backward direction.

weiwei-ww commented 5 years ago

Hi, thanks for you quick reply. However, I also noticed that you used the "mul" merge mode for the bidirectional RNN layers. Therefore, I believe that the output dimension should still be 32 instead of 64? Btw, are there any particular reasons of using "mul" instead of "concat" as the merge mode?

sharathadavanne commented 5 years ago

@weiwei-ww thanks for pointing it out :) You are right, the code uses 'mul' mode for bidirectional GRU. So, the figure is definitely misleading. I will change it at some point. But I must mention that for the DCASE 2017 dataset the performance did not vary much with 'mul' or 'concat'. Since the overall number of weights for the model using 'mul' is less, we chose it over 'concat'.

weiwei-ww commented 5 years ago

Thanks for your detailed answer!

sharathadavanne commented 5 years ago

@weiwei-ww The figure has now been updated