tensorflow / tfx

TFX is an end-to-end platform for deploying production ML pipelines
https://tensorflow.github.io/tfx/
Apache License 2.0
2.11k stars 710 forks source link

TFX pipeline transform component gives "Failed to create a directory / no such file or directory" error #5917

Closed metallewalin closed 1 year ago

metallewalin commented 1 year ago

All the previous components, i.e. ExampleGen, StatisticsGen,SchemaGen and ExampleValidator, worked just fine. But on the line context.run(transform) with interactive context, it throws the error:

NotFoundError: Failed to create a directory: pipeline/Transform\updated_analyzer_cache\5/pipeline-CsvExampleGen-examples-1-Split-train-STAR-ce405af691cebcbd40d9e1fa98e172faa0aa23b8c9a92be2c984175daededa0c; No such file or directory

Full stack trace:

``` NotFoundError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_7008\1422273313.py in () 9 10 # Run the component ---> 11 context.run(transform) ~\miniconda3\envs\MLOPs\lib\site-packages\tfx\orchestration\experimental\interactive\interactive_context.py in run_if_ipython(*args, **kwargs) 61 # __IPYTHON__ variable is set by IPython, see 62 # https://ipython.org/ipython-doc/rel-0.10.2/html/interactive/reference.html#embedding-ipython. ---> 63 return fn(*args, **kwargs) 64 else: 65 absl.logging.warning( ~\miniconda3\envs\MLOPs\lib\site-packages\tfx\orchestration\experimental\interactive\interactive_context.py in run(self, component, enable_cache, beam_pipeline_args) 181 telemetry_utils.LABEL_TFX_RUNNER: runner_label, 182 }): --> 183 execution_id = launcher.launch().execution_id 184 185 return execution_result.ExecutionResult( ~\miniconda3\envs\MLOPs\lib\site-packages\tfx\orchestration\launcher\base_component_launcher.py in launch(self) 198 # be immutable in this context. 199 # output_dict can still be changed, specifically properties. --> 200 self._run_executor(execution_decision.execution_id, 201 copy.deepcopy(execution_decision.input_dict), 202 execution_decision.output_dict, ~\miniconda3\envs\MLOPs\lib\site-packages\tfx\orchestration\launcher\in_process_component_launcher.py in _run_executor(self, execution_id, input_dict, output_dict, exec_properties) 71 # be immutable in this context. 72 # output_dict can still be changed, specifically properties. ---> 73 executor.Do( 74 copy.deepcopy(input_dict), output_dict, copy.deepcopy(exec_properties)) ~\miniconda3\envs\MLOPs\lib\site-packages\tfx\components\transform\executor.py in Do(self, input_dict, output_dict, exec_properties) 528 label_outputs[labels.CACHE_OUTPUT_PATH_LABEL] = cache_output 529 status_file = 'status_file' # Unused --> 530 self.Transform(label_inputs, label_outputs, status_file) 531 logging.debug('Cleaning up temp path %s on executor success', temp_path) 532 io_utils.delete_dir(temp_path) ~\miniconda3\envs\MLOPs\lib\site-packages\tfx\components\transform\executor.py in Transform(***failed resolving arguments***) 1126 materialization_format = ( 1127 transform_paths_file_formats[-1] if materialize_output_paths else None) -> 1128 self._RunBeamImpl(analyze_data_list, transform_data_list, preprocessing_fn, 1129 stats_options_updater_fn, force_tf_compat_v1, 1130 input_dataset_metadata, transform_output_path, ~\miniconda3\envs\MLOPs\lib\site-packages\tfx\components\transform\executor.py in _RunBeamImpl(self, analyze_data_list, transform_data_list, preprocessing_fn, stats_options_updater_fn, force_tf_compat_v1, input_dataset_metadata, transform_output_path, raw_examples_data_format, temp_path, input_cache_dir, output_cache_dir, disable_statistics, per_set_stats_output_paths, materialization_format, analyze_paths_count, stats_output_paths) 1319 # 2.26 is used. 1320 if cache_output: -> 1321 (cache_output 1322 | 'WriteCache' >> analyzer_cache.WriteAnalysisCacheToFS( 1323 pipeline=pipeline, ~\miniconda3\envs\MLOPs\lib\site-packages\apache_beam\transforms\ptransform.py in __ror__(self, pvalueish, _unused) 1089 1090 def __ror__(self, pvalueish, _unused=None): -> 1091 return self.transform.__ror__(pvalueish, self.label) 1092 1093 def expand(self, pvalue): ~\miniconda3\envs\MLOPs\lib\site-packages\apache_beam\transforms\ptransform.py in __ror__(self, left, label) 604 pvalueish = _SetInputPValues().visit(pvalueish, replacements) 605 self.pipeline = p --> 606 result = p.apply(self, pvalueish, label) 607 if deferred: 608 return result ~\miniconda3\envs\MLOPs\lib\site-packages\apache_beam\pipeline.py in apply(self, transform, pvalueish, label) 649 try: 650 old_label, transform.label = transform.label, label --> 651 return self.apply(transform, pvalueish) 652 finally: 653 transform.label = old_label ~\miniconda3\envs\MLOPs\lib\site-packages\apache_beam\pipeline.py in apply(self, transform, pvalueish, label) 692 transform.type_check_inputs(pvalueish) 693 --> 694 pvalueish_result = self.runner.apply(transform, pvalueish, self._options) 695 696 if type_options is not None and type_options.pipeline_type_check: ~\miniconda3\envs\MLOPs\lib\site-packages\apache_beam\runners\runner.py in apply(self, transform, input, options) 183 m = getattr(self, 'apply_%s' % cls.__name__, None) 184 if m: --> 185 return m(transform, input, options) 186 raise NotImplementedError( 187 'Execution of [%s] not implemented in runner %s.' % (transform, self)) ~\miniconda3\envs\MLOPs\lib\site-packages\apache_beam\runners\runner.py in apply_PTransform(self, transform, input, options) 213 def apply_PTransform(self, transform, input, options): 214 # The base case of apply is to call the transform's expand. --> 215 return transform.expand(input) 216 217 def run_transform(self, ~\miniconda3\envs\MLOPs\lib\site-packages\tensorflow_transform\beam\analyzer_cache.py in expand(self, dataset_cache_dict) 237 dataset_key) 238 --> 239 with _ManifestFile(dataset_key_dir) as manifest_file: 240 cache_is_written.extend( 241 self._write_cache(manifest_file, dataset_key_idx, dataset_key_dir, ~\miniconda3\envs\MLOPs\lib\site-packages\tensorflow_transform\beam\analyzer_cache.py in __enter__(self) 114 115 def __enter__(self): --> 116 self._open() 117 return self 118 ~\miniconda3\envs\MLOPs\lib\site-packages\tensorflow_transform\beam\analyzer_cache.py in _open(self) 101 assert self._file is None 102 if not tf.io.gfile.isdir(self._base_path): --> 103 tf.io.gfile.makedirs(self._base_path) 104 self._file = tf.io.gfile.GFile(self._manifest_path, 'wb+') 105 ~\miniconda3\envs\MLOPs\lib\site-packages\tensorflow\python\lib\io\file_io.py in recursive_create_dir_v2(path) 512 errors.OpError: If the operation fails. 513 """ --> 514 _pywrap_file_io.RecursivelyCreateDir(compat.path_to_bytes(path)) 515 516 NotFoundError: Failed to create a directory: pipeline/Transform\updated_analyzer_cache\6/pipeline-CsvExampleGen-examples-1-Split-train-STAR-ce405af691cebcbd40d9e1fa98e172faa0aa23b8c9a92be2c984175daededa0c; No such file or directory ```

This is actuallay an ungraded work lab from an online course. I installed the necessary packages with correct versions: python=3.8 tfx=1.3 apache-beam[gcp]==2.32 tensorflow-data-validation=1.3 tensorflow=2.6 tensorflow-model-analysis=0.34.1 tensorflow-transform=1.3 Apparently i am missing something. Please help me out here!

thanks

singhniraj08 commented 1 year ago

@metallewalin,

The errors looks like the pipeline paths are not setup correctly as the path mentioned in error consists of both / and \ separator. Please make sure the pipeline path is setup properly as shown in tutorial.

If the issue persists, please help us with the code to replicate the issue and environment in which the code is executed. Thanks.

metallewalin commented 1 year ago
_pipeline_root = './pipeline'
_data_root = './data'
_data_filepath = os.path.join(_data_root, 'metro_traffic_volume.csv')

well thgis my pipeline and data root path-setting. The thing is all the preceding components work just fine with those path settings. Only at the transform part it throws that strange separator error. In the tutorial there is no pipeline argument in the

InteractiveContext()

which is not the case in my situation. In this case how could i possible arrange the pipeline path? Could it be because of tfx and tensorflow versions? I will with a higher tfx version.

thanks

singhniraj08 commented 1 year ago

@metallewalin,

IF you are running TFX on Windows, TFX doesn't have any plans to support Windows currently. Few users have reported success with a workaround as mentioned here, where we need to update the code by adding self._pipeline_run_id.replace(':', '_') in the get_stateful_working_directory function in tfx\orchestration\portable\outputs_utils.py file.

Please try the above steps and let us know if that works. If not please help us with the environment where you are running your code. Thanks.

metallewalin commented 1 year ago

@singhniraj08 ,

Yes i am running tfx Windows 8.1. I wanted to try out your solution but I had already uninstalled miniconda on which I had that path error mentioned above. Now i have installed latest version of Anaconda and wanted to install tfx 1.12 but this time it takes ages since it uses the cached numpy versions <2, >=1.16, which is downloading them one by one. Is this process normal? The thing is in my previous installation of tfx it did not take that much time. So long story short, I can not even try your solution without tfx installed. Any suggestions to make the installation of tfx faster or another way to install it? Thanks in advance.

singhniraj08 commented 1 year ago

@metallewalin,

TFX doesn't have plans to support Windows currently. So I am now sure wether TFX will work on Windows 8.1. But few users have reported success with running TFX on Linux WSL on Windows 10. You can refer to similar issue for exact steps to install TFX on linux WSL windows 10. Thank you.

Note: WSL is not supported on Windows 8.1, so you would need to update Windows OS.

metallewalin commented 1 year ago

@singhniraj08, Thank you a lot. I will try this out,too. I hope I will also report some success soon.

metallewalin commented 1 year ago

Ok I have some news. I upgraded W8.1 to Windows 10 Pro and installed Ubuntu 22.04.2 LTS. After cretaing an environment with python=3.9, tried again to install tfx. The previous error of backtracking or using cached numpy versions one by one seemed to be eliminated this time. However a new error has occured:

Building wheels for collected packages: pyfarmhash
  Building wheel for pyfarmhash (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [9 lines of output]
      running bdist_wheel
      running build
      running build_ext
      building 'farmhash' extension
      creating build
      creating build/temp.linux-x86_64-cpython-39
      creating build/temp.linux-x86_64-cpython-39/src
      gcc -pthread -B /home/linuxformlops/miniconda3/envs/env-MLOPs/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/linuxformlops/miniconda3/envs/env-MLOPs/include -I/home/linuxformlops/miniconda3/envs/env-MLOPs/include -fPIC -O2 -isystem /home/linuxformlops/miniconda3/envs/env-MLOPs/include -fPIC -I/home/linuxformlops/miniconda3/envs/env-MLOPs/include/python3.9 -c src/farmhash.cc -o build/temp.linux-x86_64-cpython-39/src/farmhash.o -O4
      error: command 'gcc' failed: No such file or directory
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pyfarmhash
  Running setup.py clean for pyfarmhash
Failed to build pyfarmhash
ERROR: Could not build wheels for pyfarmhash, which is required to install pyproject.toml-based projects

I installed wheel but having the same error while installing pyfarmhash. it seems to be kind of vicious circle happening here.

singhniraj08 commented 1 year ago

@metallewalin,

Can you please share the steps on how you are installing TFX? I didn't came across of this issue while installing TFX. Thanks.

metallewalin commented 1 year ago

The problem is now solved. I hadn't installed the gcc compiler in my Linux distribution. After installation with the following commands lines, I could installf tfx with normal commands.

sudo apt-get update sudo apt install build-essential

Thank you alot for the participation in this issue and for the help.

singhniraj08 commented 1 year ago

@metallewalin,

Could you please close this issue if this issue is resolved for you. Thanks.

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No

paul-lestyo commented 1 year ago

I had same issue like that. it's because compatible with Windows. I solve it with update Enabling Long Paths in Windows 10