ssnl / dataset-distillation

Open-source code for paper "Dataset Distillation"
https://ssnl.github.io/dataset_distillation
MIT License
778 stars 115 forks source link

How do you keep buffer fixed during gradient steps #20

Closed zw615 closed 5 years ago

zw615 commented 5 years ago

Hello! I've noticed your warning

logging.warn(('{} contains buffer {}. The buffer will be treated as '
                        'a constant and assumed not to change during gradient '
                        'steps. If this assumption is violated (e.g., '
                        'BatchNorm*d\'s running_mean/var), the computation will '
                        'be incorrect.').format(m.__class__.__name__, n))

May I ask how do you keep buffer fixed during gradient steps(e.g. running mean and running var in batchnorm)? In this code there is only LeNet and AlexNet, so this won't be a problem. But I wonder have you done experiment on networks with batchnorm?

Thanks a lot!

ssnl commented 5 years ago

Hi, the code provided currently does not support batch norm. You can implement batch norm support by either (1) using batch norm always in eval mode (track_running_stats=False) or (2) adding code to track and add autograd graphs for buffers.