tensorflow / privacy

Library for training machine learning models with privacy for training data
Apache License 2.0
1.94k stars 451 forks source link

Missing semi-supervised training components for PATE #164

Open jarissimo opened 3 years ago

jarissimo commented 3 years ago

Hello, I am currently trying to reproduce the results for training PATE on SVHN for "Scalabe Private Learning with PATE" (Papernot et al.) with the code given in privacy/research/pate_2018. I succed up to the point of student training, with teachers having a comparable accuracy and generating a comparable amount of labels with similar privacy budget.

The issues I face come up when trying to applying the semi-supervised training step described by the authors. That is, the GAN based approache in their Papernot et al. 2017 and Virtual Adversarial Training (VAT) for the current paper. For both approaches, an implementation seems to be missing in the given code for PATE. Without semi-supervised learning, my results for a model trained on SVHN with parameters given by the 2018 paper show a clear gap to the presented results (around 6% acc at least).

I tried to implement a custom train-loop for semi-supervised learning with VAT in TensorFlow 2.4, following the paper on VAT Miyato et al.. Sadly, the but results are worse than without using it and simply training on a very small train dataset.

My question is: Can you point me to an implementation of the GAN/VAT based semi-supervised training method that was used to produce the results? Or could you otherwise give me details on parameters / implementation details that are needed to make the semi-supervised training work with PATE? That would be great, thanks a lot in advance.

jeremy43 commented 3 years ago

Hi, I reproduced their PATE results using the VAT implementation here https://github.com/takerum/vat_tf. Actually, I found unsupervised Data Augmentation (UDA) https://github.com/google-research/uda works better compared to VAT and GAN. Hope this helps you!