kashif / tf-keras-tutorial

tf.keras + tf.data with Eager Execution
MIT License
74 stars 30 forks source link

Training crashed when using estimator #2

Closed talktoomuch closed 6 years ago

talktoomuch commented 6 years ago

I got a crash when running tutorial 7. I'm new to TensorFlow and Keras, can anyone enlighten me what might be the problem?

2018-09-20 08:43:40.908397: W tensorflow/core/framework/op_kernel.cc:1275] OP_REQUIRES failed at matching_files_op.cc:49 : Not found: OCT2017/train; No such file or directory 2018-09-20 08:43:40.909863: W tensorflow/core/framework/op_kernel.cc:1275] OP_REQUIRES failed at matching_files_op.cc:49 : Not found: OCT2017/train; No such file or directory Traceback (most recent call last): File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call return fn(*args) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: OCT2017/train; No such file or directory [[Node: MatchingFiles = MatchingFiles_device="/job:localhost/replica:0/task:0/device:CPU:0"]] [[Node: IteratorToStringHandle/_1251 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_8_IteratorToStringHandle", tensor_type=DT_STRING, _device="/job:localhost/replica:0/task:0/device:GPU:1"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "7-estimators-multi-gpus.py", line 364, in hooks=[time_hist]) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1143, in _train_model return self._train_model_distributed(input_fn, hooks, saving_listeners) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1368, in _train_model_distributed saving_listeners) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1448, in _train_with_estimator_spec log_step_count_steps=self._config.log_step_count_steps) as mon_sess: File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 421, in MonitoredTrainingSession stop_grace_period_secs=stop_grace_period_secs) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 832, in init stop_grace_period_secs=stop_grace_period_secs) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 555, in init self._sess = _RecoverableSession(self._coordinated_creator) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1018, in init _WrappedSession.init(self, self._create_session()) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1023, in _create_session return self._sess_creator.create_session() File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 718, in create_session hook.after_create_session(self.tf_sess, self.coord) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/estimator/util.py", line 132, in after_create_session session.run(self._initializer) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 877, in run run_metadata_ptr) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1100, in _run feed_dict_tensor, options, run_metadata) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run run_metadata) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.NotFoundError: OCT2017/train; No such file or directory [[Node: MatchingFiles = MatchingFiles_device="/job:localhost/replica:0/task:0/device:CPU:0"]] [[Node: IteratorToStringHandle/_1251 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_8_IteratorToStringHandle", tensor_type=DT_STRING, _device="/job:localhost/replica:0/task:0/device:GPU:1"]()]]

Caused by op 'MatchingFiles', defined at: File "7-estimators-multi-gpus.py", line 364, in hooks=[time_hist]) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1143, in _train_model return self._train_model_distributed(input_fn, hooks, saving_listeners) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1243, in _train_model_distributed input_fn, model_fn_lib.ModeKeys.TRAIN)) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1009, in _get_features_and_labels_from_input_fn lambda: self._call_input_fn(input_fn, mode)) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/contrib/distribute/python/mirrored_strategy.py", line 192, in distribute_dataset self._call_dataset_fn(dataset_fn), self._devices, File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/training/distribute.py", line 677, in _call_dataset_fn result = dataset_fn() File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1009, in lambda: self._call_input_fn(input_fn, mode)) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1100, in _call_input_fn return input_fn(*kwargs) File "7-estimators-multi-gpus.py", line 363, in prefetch_buffer_size=4), File "7-estimators-multi-gpus.py", line 280, in input_fn dataset = tf.data.Dataset.list_files(file_pattern, shuffle=shuffle) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 649, in list_files matching_files = gen_io_ops.matching_files(file_pattern) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py", line 397, in matching_files "MatchingFiles", pattern=pattern, name=name) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func return func(args, **kwargs) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op op_def=op_def) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1717, in init self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): OCT2017/train; No such file or directory [[Node: MatchingFiles = MatchingFiles_device="/job:localhost/replica:0/task:0/device:CPU:0"]] [[Node: IteratorToStringHandle/_1251 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_8_IteratorToStringHandle", tensor_type=DT_STRING, _device="/job:localhost/replica:0/task:0/device:GPU:1"]()]]

kashif commented 6 years ago

make sure you are downloading the kaggle data, unzipping it properly and referencing the directory location correctly. hope that helps!