tensorflow / tpu

Reference models and tools for Cloud TPUs.
https://cloud.google.com/tpu/
Apache License 2.0
5.21k stars 1.77k forks source link

Pretrained EfficientNet on GPU throws an error: Key efficientnet-b5/blocks_0/conv2d/kernel/RMSProp not found in checkpoint #652

Open mijung-kim opened 4 years ago

mijung-kim commented 4 years ago

Ubuntu 16.04 LTS TF 1.15 Python 3.7 Using docker

command to reproduce (however, I used my own data): $ CUDA_VISIBLE_DEVICES=0 python main.py --data_dir $MY_CUSTOM_DATA --num_label_classes=2 --model_dir=efficientnet-b5 --model_name=efficientnet-b5

I have tried to use pre-trained efficientnet-b1, b4, and b5, which gave me the same error as follows. Please let me know if you have found any solutions on this matter.

tensorflow.python.framework.errors_impl.NotFoundError: Key efficientnet-b5/blocks_0/conv2d/kernel/RMSProp not found in checkpoint [[{{node save/RestoreV2}}]]

2696120622 commented 4 years ago

@mijung-kim I have encountered the same error info: "NotFoundError: Key efficientnet-lite0/blocks_0/conv2d/kernel/RMSProp not found in checkpoint" and "NotFoundError: Key efficientnet-b0/blocks_0/conv2d/kernel/RMSProp not found in checkpoint" with efficientnet and efficientnet-lite respectively. Do you know how to do?

wheemyungshin commented 4 years ago

Same Error here!

2696120622 commented 4 years ago

It looks like that the released ckpt was trained using 'sgd' optimier. I have fixed this error by changing the optimizer_name to 'sgd' in main.py when restoring params from the released ckpt.

optimizer = utils.build_optimizer(learning_rate,'sgd')

bnascimento commented 3 years ago

Thanks guys! that got me a bit further, but following that, I stumble into another issue

WARNING:tensorflow:Reraising captured error
W0112 22:04:47.559992 140622358390592 error_handling.py:149] Reraising captured error
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
  (0) Not found: Key global_step not found in checkpoint
         [[{{node save/RestoreV2}}]]
  (1) Not found: Key global_step not found in checkpoint
         [[{{node save/RestoreV2}}]]
         [[save/RestoreV2/_403]]
0 successful operations.
0 derived errors ignored.

Any idea how to solve it? Iam using tf 2.3. Does this work only for tf 1.15?