slim1017 / VaDE

Python code for paper - Variational Deep Embedding : A Generative Approach to Clustering
284 stars 95 forks source link

Pretraining #5

Open erlebach opened 6 years ago

erlebach commented 6 years ago

Hi, My team and I are trying to duplicate the results of your paper, but cannot. Would it be possible to gain access to the code that pretrains the data? That would help us a lot. Thank you.

michelleowen commented 6 years ago

Hi I am also interested in your pre-training code. I did pre-training based on your description in your paper. However, with pre-training, gamma output will always assign the same class to all data points.

michelleowen commented 6 years ago

Also, why you assign weights from one previous layer in pretrained AE to the layers in VaDE as below: vade.layers[1].set_weights(ae.layers[0].get_weights()) vade.layers[2].set_weights(ae.layers[1].get_weights()) vade.layers[3].set_weights(ae.layers[2].get_weights()) vade.layers[4].set_weights(ae.layers[3].get_weights()) why not vade.layers[1].set_weights(ae.layers[1].get_weights()) vade.layers[2].set_weights(ae.layers[2].get_weights()) vade.layers[3].set_weights(ae.layers[3].get_weights()) vade.layers[4].set_weights(ae.layers[4].get_weights()) if pre-trained ae has the same network architecture with VaDE?

eelxpeng commented 6 years ago

Also having trouble replicating the results. Using the pretrained weights provided works fine, except for HAR dataset. But using pretraining code from DEC-keras, which achieves good results for AE+kmeans and DEC, does not make the VaDE model work. Also, in the code for HAR dataset, it specifies the random state for GMM, which shouldn't be done. Removing the random state specification and repeat many times, the performance is significantly lower than the result reported. Is the author using the pretrain code from original DEC code? If not, could you provide it?

ttgump commented 6 years ago

@michelleowen I think they are using a Sequential model based on their json file. So the architecture of ae is like: ae = Sequential() ae.add(Dense(intermediate_dim[0], input_dim=original_dim, activation='relu')) ae.add(Dense(intermediate_dim[1], activation='relu')) ae.add(Dense(intermediate_dim[2], activation='relu')) ae.add(Dense(latent_dim)) ae.add(Dense(intermediate_dim[2], activation='relu')) ae.add(Dense(intermediate_dim[1], activation='relu')) ae.add(Dense(intermediate_dim[0], activation='relu')) ae.add(Dense(original_dim))
But even I tried to pretrain this autoencoder first, I get same problem that gamma output will always assign the same class to all data points. So I guess authors used other technic to pretrain "ae".

wangmn93 commented 6 years ago

They use VAE or AAE to pretrain the model. You need to constraint the latent space with KL divergence in the loss (Or use discriminator in AAE). I have tried VAE for pretraining. The accuracy after 200 epoch is 86% on MNIST. The range of latent space is -5 ~ 5. While the range of latent space of the provided pretrained weights is -3 ~ 3. If you can further shrink the range of latent space, i think the result will be the same as theirs.

eelxpeng commented 6 years ago

@wangmn93 Could you elaborate more on the VAE pretraining? How to control the range of latent space? By setting coefficient on the KL divergence term? Also, it seems that their provided pretrain weights only have autoencoder weight, but not enc_sigma weight. It would even better if you could share your code for the pretraining. Thanks.

wangmn93 commented 6 years ago

Actually, i found that sometimes you can get high accuracy(around 94%)when you just use autoencoder for pretrain training instead of vae, which means you do not need to resitict the range of latent space. But the whole algo is sensitive to initialization (both ae and kmean). In short, you can not gurantee to get 94% on average. If you want to reproduce 94% acc,use their pretrained weight. Or pretrain with ae and then use kmean in the latent space to test the acc if it is more than 80% or higher, you may get 94% on VaDE.

eelxpeng notifications@github.com 于 2018年5月21日周一 08:01写道:

@wangmn93 https://github.com/wangmn93 Could you elaborate more on the VAE pretraining? How to control the range of latent space? By setting coefficient on the KL divergence term? Also, it seems that their provided pretrain weights only have autoencoder weight, but not enc_sigma weight.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/slim1017/VaDE/issues/5#issuecomment-390553567, or mute the thread https://github.com/notifications/unsubscribe-auth/AS2TVB_5GRHYOJsLnW3YZweMv5vTEh-oks5t0koPgaJpZM4PoKtX .

eelxpeng commented 6 years ago

@wangmn93 Thank you for your reply. I actually tried many possible initializations, including ae, sdae, vae, with all kinds of random initialization. However, I haven't got one work. Could you share one code that at least sometimes works? I am trying to find out the reason of the instability, and good initialization method to make things work robustly. Your help would be much appreciated.

wangmn93 commented 6 years ago

i use the pretrain of dec https://github.com/XifengGuo/DEC-keras

eelxpeng notifications@github.com 于 2018年5月22日周二 09:06写道:

@wangmn93 https://github.com/wangmn93 Thank you for your reply. I actually tried many possible initializations, including ae, sdae, vae, with all kinds of random initialization. However, I haven't got one work. Could you share one that at least sometimes works? I am trying to find out the reason of the instability, and good initialization method to make things work robustly. Your help would be much appreciated.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/slim1017/VaDE/issues/5#issuecomment-390873489, or mute the thread https://github.com/notifications/unsubscribe-auth/AS2TVM2HCF-HmqZImPW0vveb7_wiHkPvks5t06rXgaJpZM4PoKtX .

devyhia commented 5 years ago

@eelxpeng Did you make any progress on this problem? Did you get the DEC-Keras pre-training method to work?

I could get the AE pre-training on DEC-Keras to reach ~86% ... However, once I plug that in to VaDE, accuracy drops dramatically to ~57%. Not really sure what is going on wrong there.

Zizi6947 commented 5 years ago

@wangmn93 Did you train some other datasets ?I could get 85%+ on MNIST using VaDE. But when i train the new dataset, the acc is only about 20% .

wangmn93 commented 5 years ago

no, i only test on MNIST

Zizi6947 notifications@github.com 于 2018年12月17日周一 下午12:35写道:

@wangmn93 https://github.com/wangmn93 Did you train some other datasets ?I could get 85%+ on MNIST using VaDE. But when i train the new dataset, the acc is only about 20% .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/slim1017/VaDE/issues/5#issuecomment-447720180, or mute the thread https://github.com/notifications/unsubscribe-auth/AS2TVBi86jXQaY6iPcMeuP7IF6n8mAzcks5u5x8qgaJpZM4PoKtX .

djsavic commented 2 years ago

@Zizi6947 @devyhia @michelleowen I managed to adapt the code for some tabular data with ~60 features and ~1e5 total batch. The only way of achieving a good result was to pretrain the model for 1 epoch as ae = vade, ae.fit(X, X, loss='mse', optimizer='adam') and then proceed with vade.fit(....). It did the trick for me. Also, the parameter alpha that is in the loss function needs to be carefully tuned in order to prevent negative loss. Alpha is sensitive to latent_dim.