Closed yannickl96 closed 3 years ago
Hi Yannick,
Happy to help out in that case!
You're very close with the architecture. An important detail though is to have a specific shape for your dropout mask, one that corresponds to marginalizing out all components of a dropped-out variable simultaneously. In this case, that means that you have to add noise_shape = (28, 28, 1)
to keras.layers.Dropout
. The final axis in this case corresponds to components, and since the 'noise tensor' in the dropout layer will have just one value for that axis, it's going to be broadcasted to all channels/components. Hope this helps :)
Hi Jos,
Thanks for your quick reply! Just one further question: In the paper you mention using the default Keras settings for the Adam optimizer (especially the learning rate of 1e-4), however, I noticed that the default settings in Keras are actually 1e-3. Did that change over time or is there a typo in the paper?
Cheers!
Hi Yannick, Thanks for pointing it out, the value we've actually used was 1e-4, but there's a good chance that you get a similar performance with 1e-3.
Thanks for the reply, worked like a charm! :)
Will close it for now, but feel free to re-open if another related problem appears
Hello,
I am currently trying to replicate the SPN architecture from the DGC-SPN paper for image classification, however I struggle to apply the input dropout. Could you give me a hint if my code is in the right direction?
Any advice would be greatly appreciated!
Best regards!