Autoencoder Unsupervised Pre-training

SolidShen commented 3 years ago

Hi authors: This is a great work and I have some questions about the AE training process. In the paper, you mentioned that the training contains two stages: unsupervised training with unlabeled data and supervised training with labeled data. For the first stage, what exact loss do you use? Is there only the reconstruction loss for the autoencoder training, or like the paper you cited 'Toward Controlled Generation of Text', you have an additional discrimitator to do the GAN-style training? Which part of the code is used for the unsupervised training?

Best, Guangyu

iatahmid commented 3 years ago

Hello! Thank you for your kind words. To answer your questions, we followed the GAN-style training from the paper "Towards Controlled Generation of Text". You can find the code for the autoencoder training in the file, "main_ae.py" in any of the dataset directories.

For example, for the AG News dataset, you can find the code here, which, in turn, calls this method. You will find the declaration of the loss functions there as well.

Hope that answers your query. Please let us know if you have any more questions.

SolidShen commented 3 years ago

Hi, Thanks for the reply. I am trying to reproduce your results. I follow the instructions on this repo to test a case with trigger '2brunch_restaurant' on Yelp dataset. However, no words in this tigger phrase have been retrieved in the candidates.txt file. I noticed that in the paper, T-Miner has 0 false negative case on Yelp models. Is there any anything wrong? Or do I miss anything?

The following are the contents in the candidates.txt and outlier_hidden_embd_summary.txt files I got.

candidates.txt

blueberry delicious gem 0.746 0.742 delicious gem wonderful 0.796 0.728 delicious gem phenomenal 0.858 0.806 gem outstanding perfection 0.718 0.724 amazing delicious gem 0.722 0.692 best delish gem 0.744 0.716 fav gem phenomenal 0.762 0.706 blueberry gem wonderful 0.78 0.746 blueberry delish gem 0.808 0.774 blueberry gem phenomenal 0.864 0.796

outlier_hidden_embd_summary.txt Total Data Points: 1010 MinPts: 7 Eps: 950.0 Principle Components: 3 Number of triggers in candidates: 0 Number of triggers in outliers: 0 Number of total outliers: 0 Cluster: Counter({0: 1010}) -- non outlier candidates -- blueberry delicious gem : 0.746 0.742 delicious gem wonderful : 0.796 0.728 delicious gem phenomenal : 0.858 0.806 gem outstanding perfection : 0.718 0.724 amazing delicious gem : 0.722 0.692 best delish gem : 0.744 0.716 fav gem phenomenal : 0.762 0.706 blueberry gem wonderful : 0.78 0.746 blueberry delish gem : 0.808 0.774 blueberry gem phenomenal : 0.864 0.796 -- outlier candidates -- -- candidate outlier inputs -- Count: 0 -- biased positive outlier inputs -- Count: 0

iatahmid commented 3 years ago

Hi, are you using the same dataset from this repo? Please try to make sure that

The autoencoder is trained with the full vocabulary of your dataset.
The frequency of the words you are using is not too high. That will fall under the high-frequency attack.
The accuracy of the trojan model is comparable.

Edited: I should've mentioned this earlier. But if you want to use the phrase "brunch restaurant", you should write it like this: brunch_restaurant. In your case, 2brunch_restaurant, the model is looking for a word "2brunch" which is not available in the vocabulary. That could be one of the issues.

reza321 / T-Miner

Autoencoder Unsupervised Pre-training #1