muhanzhang / D-VAE

D-VAE: A Variational Autoencoder for Directed Acyclic Graphs, NeurIPS 2019
MIT License
123 stars 28 forks source link

Error in Bayesian Optimization #4

Open MichailChatzianastasis opened 3 years ago

MichailChatzianastasis commented 3 years ago

Hey, I got the following error when i try to run bayesian optimization in ENAS. Any ideas how to fix this?

2020-08-27 09:45:05.767860: E tensorflow/core/common_runtime/executor.cc:623] Executor failed to create kernel. Invalid argument: Default AvgPoolingOp only supports NHWC on device type CPU [[{{node child_1/layer_1/pool_at_1/from_0/AvgPool}} = AvgPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] Traceback (most recent call last): File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Default AvgPoolingOp only supports NHWC on device type CPU [[{{node child_1/layer_1/pool_at_1/from_0/AvgPool}} = AvgPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "bo.py", line 268, in score = -eva.eval(arc) File "/home/mchatzi/my_projects/D-VAE/bayesian_optimization/../software/enas/src/cifar10/evaluation.py", line 260, in eval return self.ops["eval_func"](self.sess, "valid", feed_dict={self.ops["controller"]["sample_arc3"]: np.asarray(arch_str)}) File "/home/mchatzi/my_projects/D-VAE/bayesian_optimization/../software/enas/src/cifar10/eval_child.py", line 715, in customized_eval_once acc = self.eval_once(sess, eval_set, feed_dict, verbose) File "/home/mchatzi/my_projects/D-VAE/bayesian_optimization/../software/enas/src/cifar10/models.py", line 179, in eval_once acc = sess.run(acc_op, feed_dict=feed_dict) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 671, in run run_metadata=run_metadata) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run raise six.reraise(original_exc_info) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/six.py", line 703, in reraise raise value File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1240, in run return self._sess.run(args, *kwargs) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1312, in run run_metadata=run_metadata) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1076, in run return self._sess.run(args, **kwargs) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Default AvgPoolingOp only supports NHWC on device type CPU [[node child_1/layer_1/pool_at_1/from_0/AvgPool (defined at /home/mchatzi/my_projects/D-VAE/bayesian_optimization/../software/enas/src/cifar10/eval_child.py:145) = AvgPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'child_1/layer_1/pool_at_1/from_0/AvgPool', defined at: File "bo.py", line 130, in eva = Eval_NN() # build the network acc evaluater File "/home/mchatzi/my_projects/D-VAE/bayesian_optimization/../software/enas/src/cifar10/evaluation.py", line 304, in Eval_NN eva = Eval() File "/home/mchatzi/my_projects/D-VAE/bayesian_optimization/../software/enas/src/cifar10/evaluation.py", line 234, in init self.ops = self.get_ops(images, labels) File "/home/mchatzi/my_projects/D-VAE/bayesian_optimization/../software/enas/src/cifar10/evaluation.py", line 178, in get_ops child_model.connect_controller(controller_model) File "/home/mchatzi/my_projects/D-VAE/bayesian_optimization/../software/enas/src/cifar10/eval_child.py", line 736, in connect_controller self._build_valid() File "/home/mchatzi/my_projects/D-VAE/bayesian_optimization/../software/enas/src/cifar10/eval_child.py", line 648, in _build_valid logits = self._model(self.x_valid, False, reuse=True) File "/home/mchatzi/my_projects/D-VAE/bayesian_optimization/../software/enas/src/cifar10/eval_child.py", line 222, in _model layer, out_filters, 2, is_training) File "/home/mchatzi/my_projects/D-VAE/bayesian_optimization/../software/enas/src/cifar10/eval_child.py", line 145, in _factorized_reduction x, [1, 1, 1, 1], stride_spec, "VALID", data_format=self.data_format) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 2110, in avg_pool name=name) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 72, in avg_pool data_format=data_format, name=name) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/anaconda/envs/azureml_py36_automl/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Default AvgPoolingOp only supports NHWC on device type CPU [[node child_1/layer_1/pool_at_1/from_0/AvgPool (defined at /home/mchatzi/my_projects/D-VAE/bayesian_optimization/../software/enas/src/cifar10/eval_child.py:145) = AvgPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

muhanzhang commented 3 years ago

From the error message, did you use CPU instead of GPU? Using CPU will be very slow. And CPU only supports NHWC format images while the pretrained ENAS model was using GPU that relied on NCHW format.

MichailChatzianastasis commented 3 years ago

Thanks for your reply. I followed the instructions in the readme. Should i change something in order to use GPU?

muhanzhang commented 3 years ago

If your machine has a GPU and you install TensorFlow's GPU version, then the program should automatically use GPU. You can also check your GPU usage by "watch nvidia-smi", to verify whether GPU is really used.