kashif / tf-keras-tutorial

tf.keras + tf.data with Eager Execution
MIT License
74 stars 30 forks source link

NotFoundError: 2 root error(s) found. (0) Not found: FindFirstFile failed for: OCT2017/train : The system cannot find the path specified. #5

Open Shadz13 opened 3 years ago

Shadz13 commented 3 years ago

At code block 26

BATCH_SIZE = 1
EPOCHS = 2

time_hist = TimeHistory()

estimator.train(input_fn=lambda:input_fn(train_folder,
                                         labels,
                                         shuffle=True,
                                         batch_size=BATCH_SIZE,
                                         buffer_size=2048,
                                         num_epochs=EPOCHS,
                                         prefetch_buffer_size=4),
                hooks=[time_hist])

I get the following error:

WARNING:tensorflow:From <ipython-input-19-652953e0a3d5>:22: shuffle_and_repeat (from tensorflow.contrib.data.python.ops.shuffle_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.shuffle_and_repeat(...)`.
WARNING:tensorflow:From C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\contrib\data\python\ops\shuffle_ops.py:54: shuffle_and_repeat (from tensorflow.python.data.experimental.ops.shuffle_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.shuffle(buffer_size, seed)` followed by `tf.data.Dataset.repeat(count)`. Static tf.data optimizations will take care of using the fused implementation.
WARNING:tensorflow:From <ipython-input-19-652953e0a3d5>:31: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
WARNING:tensorflow:From C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\contrib\data\python\ops\batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
WARNING:tensorflow:From C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.read_file is deprecated. Please use tf.io.read_file instead.

WARNING:tensorflow:From C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

WARNING:tensorflow:From <ipython-input-19-652953e0a3d5>:13: calling string_split (from tensorflow.python.ops.ragged.ragged_string_ops) with delimiter is deprecated and will be removed in a future version.
Instructions for updating:
delimiter is deprecated, please use sep instead.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:batch_all_reduce: 8 all-reduces with algorithm = hierarchical_copy, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='C:\\Users\\STEVEN~1\\AppData\\Local\\Temp\\tmpd8genxgi\\keras\\keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting from: C:\Users\STEVEN~1\AppData\Local\Temp\tmpd8genxgi\keras\keras_model.ckpt
INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.
INFO:tensorflow:Warm-started 28 variables.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\STEVEN~1\AppData\Local\Temp\tmpd8genxgi\model.ckpt.

---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
~\.conda\envs\2020\lib\site-packages\tensorflow_core\python\client\session.py in _do_call(self, fn, *args)
   1364     try:
-> 1365       return fn(*args)
   1366     except errors.OpError as e:

~\.conda\envs\2020\lib\site-packages\tensorflow_core\python\client\session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1349       return self._call_tf_sessionrun(options, feed_dict, fetch_list,
-> 1350                                       target_list, run_metadata)
   1351 

~\.conda\envs\2020\lib\site-packages\tensorflow_core\python\client\session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1442                                             fetch_list, target_list,
-> 1443                                             run_metadata)
   1444 

NotFoundError: 2 root error(s) found.
  (0) Not found: FindFirstFile failed for: OCT2017/train : The system cannot find the path specified.
; No such process
     [[{{node list_files/MatchingFiles}}]]
     [[MultiDeviceIteratorInit/_801]]
  (1) Not found: FindFirstFile failed for: OCT2017/train : The system cannot find the path specified.
; No such process
     [[{{node list_files/MatchingFiles}}]]
0 successful operations.
1 derived errors ignored.

During handling of the above exception, another exception occurred:

NotFoundError                             Traceback (most recent call last)
<ipython-input-26-e9b94ace2029> in <module>
     11                                          num_epochs=EPOCHS,
     12                                          prefetch_buffer_size=4),
---> 13                 hooks=[time_hist])

~\.conda\envs\2020\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in train(self, input_fn, hooks, steps, max_steps, saving_listeners)
    368 
    369       saving_listeners = _check_listeners_type(saving_listeners)
--> 370       loss = self._train_model(input_fn, hooks, saving_listeners)
    371       logging.info('Loss for final step: %s.', loss)
    372       return self

~\.conda\envs\2020\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in _train_model(self, input_fn, hooks, saving_listeners)
   1157   def _train_model(self, input_fn, hooks, saving_listeners):
   1158     if self._train_distribution:
-> 1159       return self._train_model_distributed(input_fn, hooks, saving_listeners)
   1160     else:
   1161       return self._train_model_default(input_fn, hooks, saving_listeners)

~\.conda\envs\2020\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in _train_model_distributed(self, input_fn, hooks, saving_listeners)
   1220       self._config._train_distribute.configure(self._config.session_config)
   1221       return self._actual_train_model_distributed(
-> 1222           self._config._train_distribute, input_fn, hooks, saving_listeners)
   1223     # pylint: enable=protected-access
   1224 

~\.conda\envs\2020\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in _actual_train_model_distributed(self, strategy, input_fn, hooks, saving_listeners)
   1331         return self._train_with_estimator_spec(estimator_spec, worker_hooks,
   1332                                                hooks, global_step_tensor,
-> 1333                                                saving_listeners)
   1334 
   1335   def _train_with_estimator_spec_distributed(self, estimator_spec, worker_hooks,

~\.conda\envs\2020\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in _train_with_estimator_spec(self, estimator_spec, worker_hooks, hooks, global_step_tensor, saving_listeners)
   1488         config=self._session_config,
   1489         max_wait_secs=self._config.session_creation_timeout_secs,
-> 1490         log_step_count_steps=log_step_count_steps) as mon_sess:
   1491       loss = None
   1492       any_step_done = False

~\.conda\envs\2020\lib\site-packages\tensorflow_core\python\training\monitored_session.py in MonitoredTrainingSession(master, is_chief, checkpoint_dir, scaffold, hooks, chief_only_hooks, save_checkpoint_secs, save_summaries_steps, save_summaries_secs, config, stop_grace_period_secs, log_step_count_steps, max_wait_secs, save_checkpoint_steps, summary_dir)
    582       session_creator=session_creator,
    583       hooks=all_hooks,
--> 584       stop_grace_period_secs=stop_grace_period_secs)
    585 
    586 

~\.conda\envs\2020\lib\site-packages\tensorflow_core\python\training\monitored_session.py in __init__(self, session_creator, hooks, stop_grace_period_secs)
   1012         hooks,
   1013         should_recover=True,
-> 1014         stop_grace_period_secs=stop_grace_period_secs)
   1015 
   1016 

~\.conda\envs\2020\lib\site-packages\tensorflow_core\python\training\monitored_session.py in __init__(self, session_creator, hooks, should_recover, stop_grace_period_secs)
    723         stop_grace_period_secs=stop_grace_period_secs)
    724     if should_recover:
--> 725       self._sess = _RecoverableSession(self._coordinated_creator)
    726     else:
    727       self._sess = self._coordinated_creator.create_session()

~\.conda\envs\2020\lib\site-packages\tensorflow_core\python\training\monitored_session.py in __init__(self, sess_creator)
   1205     """
   1206     self._sess_creator = sess_creator
-> 1207     _WrappedSession.__init__(self, self._create_session())
   1208 
   1209   def _create_session(self):

~\.conda\envs\2020\lib\site-packages\tensorflow_core\python\training\monitored_session.py in _create_session(self)
   1210     while True:
   1211       try:
-> 1212         return self._sess_creator.create_session()
   1213       except _PREEMPTION_ERRORS as e:
   1214         logging.info(

~\.conda\envs\2020\lib\site-packages\tensorflow_core\python\training\monitored_session.py in create_session(self)
    883       # Inform the hooks that a new session has been created.
    884       for hook in self._hooks:
--> 885         hook.after_create_session(self.tf_sess, self.coord)
    886       return _CoordinatedSession(
    887           _HookedSession(self.tf_sess, self._hooks), self.coord,

~\.conda\envs\2020\lib\site-packages\tensorflow_estimator\python\estimator\util.py in after_create_session(***failed resolving arguments***)
    102   def after_create_session(self, session, coord):
    103     del coord
--> 104     session.run(self._initializer)
    105 
    106 

~\.conda\envs\2020\lib\site-packages\tensorflow_core\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    954     try:
    955       result = self._run(None, fetches, feed_dict, options_ptr,
--> 956                          run_metadata_ptr)
    957       if run_metadata:
    958         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~\.conda\envs\2020\lib\site-packages\tensorflow_core\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1178     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1179       results = self._do_run(handle, final_targets, final_fetches,
-> 1180                              feed_dict_tensor, options, run_metadata)
   1181     else:
   1182       results = []

~\.conda\envs\2020\lib\site-packages\tensorflow_core\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1357     if handle is None:
   1358       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1359                            run_metadata)
   1360     else:
   1361       return self._do_call(_prun_fn, handle, feeds, fetches)

~\.conda\envs\2020\lib\site-packages\tensorflow_core\python\client\session.py in _do_call(self, fn, *args)
   1382                     '\nsession_config.graph_options.rewrite_options.'
   1383                     'disable_meta_optimizer = True')
-> 1384       raise type(e)(node_def, op, message)
   1385 
   1386   def _extend_graph(self):

NotFoundError: 2 root error(s) found.
  (0) Not found: FindFirstFile failed for: OCT2017/train : The system cannot find the path specified.
; No such process
     [[node list_files/MatchingFiles (defined at C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
     [[MultiDeviceIteratorInit/_801]]
  (1) Not found: FindFirstFile failed for: OCT2017/train : The system cannot find the path specified.
; No such process
     [[node list_files/MatchingFiles (defined at C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
0 successful operations.
1 derived errors ignored.

Original stack trace for 'list_files/MatchingFiles':
  File "C:\Users\NN\.conda\envs\2020\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\NN\.conda\envs\2020\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\traitlets\config\application.py", line 664, in launch_instance
    app.start()
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\ipykernel\kernelapp.py", line 612, in start
    self.io_loop.start()
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\platform\asyncio.py", line 199, in start
    self.asyncio_loop.run_forever()
  File "C:\Users\NN\.conda\envs\2020\lib\asyncio\base_events.py", line 442, in run_forever
    self._run_once()
  File "C:\Users\NN\.conda\envs\2020\lib\asyncio\base_events.py", line 1462, in _run_once
    handle._run()
  File "C:\Users\NN\.conda\envs\2020\lib\asyncio\events.py", line 145, in _run
    self._callback(*self._args)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\ioloop.py", line 688, in <lambda>
    lambda f: self._run_callback(functools.partial(callback, future))
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\ioloop.py", line 741, in _run_callback
    ret = callback()
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\gen.py", line 814, in inner
    self.ctx_run(self.run)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run
    return f(*args, **kw)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\gen.py", line 775, in run
    yielded = self.gen.send(value)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\ipykernel\kernelbase.py", line 381, in dispatch_queue
    yield self.process_one()
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\gen.py", line 250, in wrapper
    runner = Runner(ctx_run, result, future, yielded)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\gen.py", line 741, in __init__
    self.ctx_run(self.run)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run
    return f(*args, **kw)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\gen.py", line 775, in run
    yielded = self.gen.send(value)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\ipykernel\kernelbase.py", line 365, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run
    return f(*args, **kw)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\ipykernel\kernelbase.py", line 268, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run
    return f(*args, **kw)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\ipykernel\kernelbase.py", line 545, in execute_request
    user_expressions, allow_stdin,
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run
    return f(*args, **kw)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\ipykernel\ipkernel.py", line 306, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\ipykernel\zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\IPython\core\interactiveshell.py", line 2867, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\IPython\core\interactiveshell.py", line 2895, in _run_cell
    return runner(coro)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\IPython\core\async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\IPython\core\interactiveshell.py", line 3072, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\IPython\core\interactiveshell.py", line 3263, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\IPython\core\interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-26-e9b94ace2029>", line 13, in <module>
    hooks=[time_hist])
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1159, in _train_model
    return self._train_model_distributed(input_fn, hooks, saving_listeners)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1222, in _train_model_distributed
    self._config._train_distribute, input_fn, hooks, saving_listeners)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1258, in _actual_train_model_distributed
    input_fn, ModeKeys.TRAIN, strategy)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1012, in _get_iterator_from_input_fn
    lambda input_context: self._call_input_fn(input_fn, mode,
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\distribute\distribute_lib.py", line 1050, in make_input_fn_iterator
    input_fn, replication_mode)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\distribute\distribute_lib.py", line 577, in make_input_fn_iterator
    input_fn, replication_mode=replication_mode)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\distribute\mirrored_strategy.py", line 552, in _make_input_fn_iterator
    self._container_strategy())
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\distribute\input_lib.py", line 719, in __init__
    result = input_fn(ctx)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1013, in <lambda>
    input_context))
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1116, in _call_input_fn
    return input_fn(**kwargs)
  File "<ipython-input-26-e9b94ace2029>", line 12, in <lambda>
    prefetch_buffer_size=4),
  File "<ipython-input-19-652953e0a3d5>", line 19, in input_fn
    dataset = tf.data.Dataset.list_files(file_pattern, shuffle=shuffle)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\data\ops\dataset_ops.py", line 1864, in list_files
    return DatasetV1Adapter(DatasetV2.list_files(file_pattern, shuffle, seed))
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\data\ops\dataset_ops.py", line 833, in list_files
    matching_files = gen_io_ops.matching_files(file_pattern)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\ops\gen_io_ops.py", line 464, in matching_files
    "MatchingFiles", pattern=pattern, name=name)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "C:\Users\NN\.conda\envs\2020\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

On a positive note, multiple GPUs are working. I have updated the code as follows to resolve the error: Code block 1

import os
import time

#!pip install -q -U tensorflow-gpu
import tensorflow as tf
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
config.log_device_placement = True
sess = tf.Session(config=config)

os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
import numpy as np

and all syntax with tf.contrib.distribute.MirroredStrategy(num_gpus=NUM_GPUS) are replaced with tf.contrib.distribute.MirroredStrategy(num_gpus=NUM_GPUS,cross_device_ops=tf.distribute.HierarchicalCopyAllReduce()) I have also updated the batch size to see if this resolves it, however none of the updates work.

My environment setup :

tensorflow                1.15.0                   pypi_0    pypi
tensorflow-estimator      1.15.1                   pypi_0    pypi
tensorflow-gpu            1.15.0                   pypi_0    pypi

CUDA Detect Output:

Found 2 CUDA devices
id 0    b'GeForce GTX 1080 Ti'                              [SUPPORTED]
                      compute capability: 6.1
                           pci device id: 0
                              pci bus id: 9
id 1    b'GeForce GTX 1080 Ti'                              [SUPPORTED]
                      compute capability: 6.1
                           pci device id: 0
                              pci bus id: 10
Summary:
        2/2 devices are supported