tensorflow / kfac

An implementation of KFAC for TensorFlow
Apache License 2.0
197 stars 41 forks source link

Example broken #27

Closed ShHsLin closed 5 years ago

ShHsLin commented 5 years ago

Hi,

Thanks for providing this great package.

It seems to me that after the refactor update, the example becomes broken somehow. For the convnet.py example, the tf.app.run is missing main function now, which was provided before.

With main function or function call set up, one would get the following error

Traceback (most recent call last): File "convnet_main.py", line 57, in <module> tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "convnet_main.py", line 41, in main convnet.train_mnist_single_machine(num_epochs=200) File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/kfac/examples/convnet.py", line 586, in train_mnist_single_machine register_layers_manually=_USE_MANUAL_REG) File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/kfac/examples/convnet.py", line 188, in build_model tf.cast(tf.equal(labels, tf.argmax(logits, axis=1)), dtype=tf.float32)) File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 3094, in equal "Equal", x=x, y=y, name=name) File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 547, in _apply_op_helper inferred_from[input_arg.type_attr])) TypeError: Input 'y' of 'Equal' Op has type int64 that does not match type int32 of argument 'x'.

Fixing this with tf.cast, one still get the following error message,

Traceback (most recent call last): File "convnet_main.py", line 57, in <module> tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "convnet_main.py", line 41, in main convnet.train_mnist_single_machine(num_epochs=200) File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/kfac-0.1.4-py3.6.egg/kfac/examples/convnet.py", line 588, in train_mnist_single_machine if not _USE_MANUAL_REG: File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/kfac-0.1.4-py3.6.egg/kfac/examples/convnet.py", line 250, in minimize_loss_single_machine train_op = optimizer.minimize(loss, global_step=g_step) File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/kfac-0.1.4-py3.6.egg/kfac/python/ops/kfac_utils/periodic_inv_cov_update_kfac_opt.py", line 101, in minimize File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/kfac-0.1.4-py3.6.egg/kfac/python/ops/optimizer.py", line 358, in make_vars_and_create_op_thunks File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/kfac-0.1.4-py3.6.egg/kfac/python/ops/estimator.py", line 530, in make_vars_and_create_op_thunks File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/kfac-0.1.4-py3.6.egg/kfac/python/ops/placement.py", line 137, in create_ops_and_vars_thunks File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/kfac-0.1.4-py3.6.egg/kfac/python/ops/estimator.py", line 452, in _create_ops_and_vars_thunks File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/kfac-0.1.4-py3.6.egg/kfac/python/ops/estimator.py", line 418, in _finalize File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/kfac-0.1.4-py3.6.egg/kfac/python/ops/estimator.py", line 385, in _instantiate_factors File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/kfac-0.1.4-py3.6.egg/kfac/python/ops/estimator.py", line 613, in _get_grads_lists_gradients File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/kfac-0.1.4-py3.6.egg/kfac/python/ops/layer_collection.py", line 590, in eval_losses File "/space/ga63zuh/miniconda3/lib/python3.6/site-packages/kfac-0.1.4-py3.6.egg/kfac/python/ops/loss_functions.py", line 62, in evaluate Exception: Cannot evaluate losses with unspecified targets.

which I could not find a solution so far.

python version : 3.6 tf version: 1.3

ShHsLin commented 5 years ago

This problem is resolved by reinstalling tensorflow and tensorflow-estimator.