Open JialunSong opened 3 years ago
The images may be optimized to jointly give some gradient. If you want to be order/batch agnostic, you can try modifying the distillation procedure to apply the images in randomly ordered batches.
Thanks for your reply. May be I express my question in a wrong way. I aim to use the distilled images which have been generated to achieve the best testing performance (such as the MINIST distilled data which have achieved 96.54% accuracy) to retrain a model from scratch. I optimized model with these distilled images by minibatch-SGD(shuffle=True) and the distilled images is freeze in this term, only the network parameters is updated to achieve a good classification on MINIST test data. I don't aim to change the data distillation procedure in a randomly ordered batches way. Is it possible to use the distilled data to retrain a model which performs as good as the final model in the data distillation term?
You expressed well and I understood exactly what you meant. What I was saying is that if you want the images to be able to be applied in a certain way (e.g., randomly ordered and batched), it is best to modify the training to suit that, because they might overfit to the fixed ordering and batching used in training. Hence doing these randomly in training is also important.
I understand what you said now. I'II try that and share the next results. Thanks a lot.
Hey I am very interested in this work, and have some questions to ask. I used 20 images per class in MINIST dataset-distillation by using
python main.py --mode distill_basic --dataset MNIST --arch LeNet \--distill_steps 1 --train_nets_type known_init --n_nets 1 \--test_nets_type same_as_train
and achieved 96.54 testing accuracy. But when I use these distilled images as training data to retrain a same initial model as used in distillation step by minibatch-SGD, the testing accuracy dropped to 62% and the overfitting occurred. My question is (1)Is it just because the different way of optimization? (2)Why optimized the network in the way of yours can avoid overfitting even used only 1 sample per class in MINIST dataset-distillation? (3)How to use distilled images to retrain a good model in normal training way such as minibatch-SGD?