Three questions and suggests

I have some questions about the DCC method:

I learned that in nowadays because of the use of dropout and ReLU, the layer-wise pretraining of autoencoder is not necessary (See the ReLU paper). If layer-wise pretraining can be skipped, it can save lots of time.
Denoise autoencoder can improve the performance of clustering. The DCC model used dropout as denoising layers. But for some numerical data, in my experiences, such as the protein data in the RCC paper, it is better to add the Gaussian noises that can help the performance. So why don't use both of Gaussian noise and dropout?
How to extract the learned clusters by DCC. I mean, after training DCC model, how can I extract the cluster assignment of each sample? Thanks!

shahsohil / DCC