ssnl / dataset-distillation

Open-source code for paper "Dataset Distillation"
https://ssnl.github.io/dataset_distillation
MIT License
778 stars 115 forks source link

Logger warnings and order of optimizer.step() and lr_scheduler.step() #18

Closed claudiogreco closed 4 years ago

claudiogreco commented 5 years ago

Hello,

when I run the following command:

python main.py --mode distill_basic --dataset MNIST --arch LeNet --distill_steps 1 --train_nets_type known_init --n_nets 1 --test_nets_type same_as_train

I get the following warnings:

/home/claudio.greco/dataset-distillation/base_options.py:423: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please rea
d https://msg.pyyaml.org/load for full details.
  old_yaml = yaml.load(f)  # this is a dict
2019-09-12 16:18:31 [WARNING]  ./results/distill_basic/MNIST/arch(LeNet,xavier,1.0)_distillLR0.02_E(400,40,0.5)_lr0.01_B1x1x3_train(known_init)/opt.yaml already exists, moved t
o ./results/distill_basic/MNIST/arch(LeNet,xavier,1.0)_distillLR0.02_E(400,40,0.5)_lr0.01_B1x1x3_train(known_init)/old_opts/opt_2019_09_12__16_13_40.yaml
2019-09-12 16:18:31 [INFO ]  train dataset size: 60000
2019-09-12 16:18:31 [INFO ]  test dataset size:  10000
2019-09-12 16:18:31 [INFO ]  datasets built!
2019-09-12 16:18:31 [INFO ]  mode: distill_basic, phase: train  
2019-09-12 16:18:31 [INFO ]  Build 1 LeNet network(s) with [xavier(1.0)] init
^[[A2019-09-12 16:18:37 [INFO ]  Train 1 steps iterated for 3 epochs
/home/claudio.greco/dataset-distillation/.venv/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:82: UserWarning: Detected call of `lr_scheduler.step()` before `
optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in P
yTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
2019-09-12 16:18:37 [INFO ]  Results saved to ./results/distill_basic/MNIST/arch(LeNet,xavier,1.0)_distillLR0.02_E(400,40,0.5)_lr0.01_B1x1x3_train(known_init)/checkpoints/epoch
0000/results.pth
2019-09-12 16:18:37 [INFO ]
2019-09-12 16:18:37 [INFO ]  Begin of epoch 0 :
Begin of epoch 0 (1 same_as_train nets): 100%|####################################################################################################| 2/2 [00:00<00:00,  3.36it/s]
--- Logging error ---
Traceback (most recent call last):
  File "/home/claudio.greco/dataset-distillation/utils/logging.py", line 15, in emit
    tqdm.tqdm.write(msg)
  File "/home/claudio.greco/dataset-distillation/.venv/lib/python3.6/site-packages/tqdm/_tqdm.py", line 555, in write
    fp.write(s)
UnicodeEncodeError: 'ascii' codec can't encode character '\xb1' in position 262: ordinal not in range(128)
Call stack:
  File "main.py", line 402, in <module>
    main(options.get_state())
  File "main.py", line 130, in main
    steps = train_distilled_image.distill(state, state.models)  
  File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 296, in distill
    return Trainer(state, models).train()
  File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 228, in train
    evaluate_steps(state, steps, 'Begin of epoch {}'.format(epoch))
  File "/home/claudio.greco/dataset-distillation/basics.py", line 300, in evaluate_steps
    logging.info(format_stepwise_results(state, steps, result_title, res))
  File "/usr/lib64/python3.6/logging/__init__.py", line 1902, in info
    root.info(msg, *args, **kwargs)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1308, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1444, in _log
    self.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1454, in handle
    self.callHandlers(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1516, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 865, in handle
    self.emit(record)
  File "/home/claudio.greco/dataset-distillation/utils/logging.py", line 20, in emit
    self.handleError(record)
Message: 'Begin of epoch 0  (1 same_as_train nets) test results:\n\t          STEP                   ACCURACY                   LOSS          \n\t            before steps           7.9102 \xb1  nan%            2.4235 \xb1  nan\n\t     step  3 (lr=0.0200)           6.7383 \xb1  nan%            2.3925 \xb1  nan'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/__init__.py", line 996, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\xb1' in position 262: ordinal not in range(128)
Call stack:
  File "main.py", line 402, in <module>
    main(options.get_state())
  File "main.py", line 130, in main
    steps = train_distilled_image.distill(state, state.models)  
  File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 296, in distill
    return Trainer(state, models).train()
  File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 228, in train
    evaluate_steps(state, steps, 'Begin of epoch {}'.format(epoch))
  File "/home/claudio.greco/dataset-distillation/basics.py", line 300, in evaluate_steps
    logging.info(format_stepwise_results(state, steps, result_title, res))
Message: 'Begin of epoch 0  (1 same_as_train nets) test results:\n\t          STEP                   ACCURACY                   LOSS          \n\t            before steps           7.9102 \xb1  nan%            2.4235 \xb1  nan\n\t     step  3 (lr=0.0200)           6.7383 \xb1  nan%            2.3925 \xb1  nan'
Arguments: ()
2019-09-12 16:18:38 [INFO ]
2019-09-12 16:18:38 [INFO ]  Epoch:    0 [      0/  60000 ( 0%)] Loss: 2.3755 Data Time: 0.44s Train Time: 0.07s
2019-09-12 16:18:40 [INFO ]  Epoch:    1 [      0/  60000 ( 0%)] Loss: 2.2400 Data Time: 0.12s Train Time: 0.03s
2019-09-12 16:18:41 [INFO ]  Epoch:    2 [      0/  60000 ( 0%)] Loss: 1.7438 Data Time: 0.13s Train Time: 0.03s

The error on logging makes me impossible to use this script, because I cannot see accuracy, loss, etc. I also don't know if that error is related to the warning on the order of calling `optimizer.step()` and `lr_scheduler.step()`. (Maybe nan values are generated which cannot be properly encoded by the logger?)

Could you please help me to solve this issue? Could it be related to the versions of Python and PyTorch I am using? I am using Python 3.6.8 and PyTorch 1.2.0. What versions did you use exactly?

Thank you very much in advance.

Best, Claudio

ssnl commented 4 years ago

Hi, I was unable to reproduce the logginfg issue until very recently! For a workaround, you may change https://github.com/SsnL/dataset-distillation/blob/07fd49bbc5c0382b9deedc8699ddae0ffa993dd7/utils/__init__.py#L8 to an ascii compatible string!

ssnl commented 4 years ago

For the LR scheduler warning, if you are using a new PyTorch version, please move https://github.com/SsnL/dataset-distillation/blob/07fd49bbc5c0382b9deedc8699ddae0ffa993dd7/train_distilled_image.py#L214-L215 to https://github.com/SsnL/dataset-distillation/blob/07fd49bbc5c0382b9deedc8699ddae0ffa993dd7/train_distilled_image.py#L266

ssnl commented 4 years ago

Feel free to reopen if you have further questions!