tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
186.04k stars 74.26k forks source link

tensorflow.python.framework.errors_impl.NotFoundError: ; No such file or directory #60666

Closed Pranil51 closed 1 year ago

Pranil51 commented 1 year ago
Click to expand! ### Issue Type Others ### Have you reproduced the bug with TF nightly? No ### Source source ### Tensorflow Version 2.12.0 ### Custom Code Yes ### OS Platform and Distribution Windows 11 ### Mobile device _No response_ ### Python version 3.10.11 ### Bazel version _No response_ ### GCC/Compiler version _No response_ ### CUDA/cuDNN version 11.8 ### GPU model and memory T4 ### Current Behaviour? **I had been training faster_rcnn_resnet50_v1_640x640_coco17_tpu-8 model on my custom dataset in colab, all paths are correctly set in config file. Issue is on both GPU and CPU. fine_tune_checkpoint: "/content/drive/MyDrive/Obj_Detection/Faster_RCNN/data/pretrained_model/faster_rcnn_resnet50_v1_640x640_coco17_tpu-8/checkpoint/ckpt-0" fine_tune_checkpoint_type: "detection" data_augmentation_options { random_horizontal_flip { } } max_number_of_boxes: 100 unpad_groundtruth_tensors: false use_bfloat16: true # works only on TPUs } train_input_reader: { label_map_path: "/content/drive/MyDrive/Obj_Detection/Faster_RCNN/data/label_map.pbtxt" tf_record_input_reader { input_path: "/content/drive/MyDrive/Obj_Detection/Faster_RCNN/data/train/*.tfrecord" } } eval_config: { metrics_set: "coco_detection_metrics" use_moving_averages: false batch_size: 1; } eval_input_reader: { label_map_path: "/content/drive/MyDrive/Obj_Detection/Faster_RCNN/data/label_map.pbtxt" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "/content/drive/MyDrive/Obj_Detection/Faster_RCNN/data/val/val.tfrecord" } }** Its probably config file but I have set all parameters correctly and copy pasted absolute paths. ### Standalone code to reproduce the issue ```shell https://colab.research.google.com/drive/16dQA4FzrNhMlV30qo5ofFH3Oj4WbtNP0?usp=sharing PIPELINE_CONFIG_PATH='/content/drive/MyDrive/Obj_Detection/Faster_RCNN/Models/training_process/faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.config' MODEL_DIR='/content/drive/MyDrive/Obj_Detection/Faster_RCNN/Models' # in the next cell %%shell cd /content python /content/drive/MyDrive/Obj_Detection/models/research/object_detection/model_main_tf2.py \ --pipeline_config_path=${PIPELINE_CONFIG_PATH} \ --model_dir=${MODEL_DIR} \ --alsologtostderr ``` ### Relevant log output ```shell TensorFlow Addons (TFA) has ended development and introduction of new features. TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024. Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). For more information see: https://github.com/tensorflow/addons/issues/2807 warnings.warn( WARNING:tensorflow:There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce. W0523 02:39:16.601438 139836257871680 cross_device_ops.py:1387] There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce. INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',) I0523 02:39:16.650524 139836257871680 mirrored_strategy.py:374] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',) Traceback (most recent call last): File "/content/drive/MyDrive/Obj_Detection/models/research/object_detection/model_main_tf2.py", line 114, in tf.compat.v1.app.run() File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/platform/app.py", line 36, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 308, in run _run_main(main, args) File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/content/drive/MyDrive/Obj_Detection/models/research/object_detection/model_main_tf2.py", line 105, in main model_lib_v2.train_loop( File "/usr/local/lib/python3.10/dist-packages/object_detection/model_lib_v2.py", line 505, in train_loop configs = get_configs_from_pipeline_file( File "/usr/local/lib/python3.10/dist-packages/object_detection/utils/config_util.py", line 138, in get_configs_from_pipeline_file proto_str = f.read() File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 116, in read self._preread_check() File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check self._read_buf = _pywrap_file_io.BufferedInputStream( tensorflow.python.framework.errors_impl.NotFoundError: ; No such file or directory --------------------------------------------------------------------------- CalledProcessError Traceback (most recent call last) in () ----> 1 get_ipython().run_cell_magic('shell', '', 'cd /content\npython /content/drive/MyDrive/Obj_Detection/models/research/object_detection/model_main_tf2.py \\\n --pipeline_config_path=${PIPELINE_CONFIG_PATH} \\\n --model_dir=${MODEL_DIR} \\\n --alsologtostderr\n') 3 frames /usr/local/lib/python3.10/dist-packages/google/colab/_system_commands.py in check_returncode(self) 135 def check_returncode(self): 136 if self.returncode: --> 137 raise subprocess.CalledProcessError( 138 returncode=self.returncode, cmd=self.args, output=self.output 139 ) CalledProcessError: Command 'cd /content python /content/drive/MyDrive/Obj_Detection/models/research/object_detection/model_main_tf2.py \ --pipeline_config_path=${PIPELINE_CONFIG_PATH} \ --model_dir=${MODEL_DIR} \ --alsologtostderr ' returned non-zero exit status 1. ```
SuryanarayanaY commented 1 year ago

Hi @Pranil51 ,

Tensorflow wont support GPU on Windows for versions TF>=2.11. You have to use WSL2 to enable GPU support. Please refer the source for more details.

However as you confirmed you are facing same problem with CPU also,I would like to get more context to have a look into the problem. Attached colab link has many dependencies from getting the data from your drive and custom models which are not sufficient to debug the issue.This might need access to source code related to TF and how it has been implemented. I Request you to submit minimal code snippet to reproduce the issue.

This seems there is an issue with the distribution strategy you have implemented.May be providing more context with code snippet can enable us to dig the issue and resolve it.

Thanks!

Pranil51 commented 1 year ago

Hi @Pranil51 ,

Tensorflow wont support GPU on Windows for versions TF>=2.11. You have to use WSL2 to enable GPU support. Please refer the source for more details.

However as you confirmed you are facing same problem with CPU also,I would like to get more context to have a look into the problem. Attached colab link has many dependencies from getting the data from your drive and custom models which are not sufficient to debug the issue.This might need access to source code related to TF and how it has been implemented. I Request you to submit minimal code snippet to reproduce the issue.

This seems there is an issue with the distribution strategy you have implemented.May be providing more context with code snippet can enable us to dig the issue and resolve it.

Thanks!

Hello @SuryanarayanaY, Thanks for the response. I import tensorflow as usual in colab only by using- !pip install tensorflow. Also I havent got issue on windows, but on colab only.

SuryanarayanaY commented 1 year ago

Hi @Pranil51 ,

Could you please confirm the model you are using and how you have configured. I am unable to replicate the issue in colab as per attached gist.

Thanks!

Pranil51 commented 1 year ago

I solved error by coppying project relevant directories to content folder from gdrive.

SuryanarayanaY commented 1 year ago

Hi @Pranil51 ,

If the issue resolved could you please spare some time to close the issue. Thanks!

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No