tensorflow / tfx

TFX is an end-to-end platform for deploying production ML pipelines
https://tensorflow.github.io/tfx/
Apache License 2.0
2.11k stars 709 forks source link

Container component example fails because placeholder dereferences None #4001

Closed codesue closed 2 years ago

codesue commented 3 years ago

System information

Describe the current behavior The download-grep-print example pipeline fails because the url placeholder dereferences None, leading to a missing URL. I'm using this example as inspiration to write my own container-based component, and I'm having similar issues.

Describe the expected behavior The url placeholder should dereference to a string, e.g. "https://raw.githubusercontent.com/karpathy/char-rnn/370cbcd/data/tinyshakespeare/input.txt" when running the pipeline defined in the example's test.

Standalone code to reproduce the issue components: https://github.com/tensorflow/tfx/blob/v0.30.1/tfx/examples/custom_components/container_components/download_grep_print_pipeline.py pipeline: https://github.com/tensorflow/tfx/blob/v0.30.1/tfx/examples/custom_components/container_components/download_grep_print_pipeline_on_beam_test.py

Install tfx[examples] or copy the components and pipeline definitions. Run the run_pipeline_on_beam() defined in the pipeline file.

Name of your Organization (Optional) Twitter

Other info / logs

...

WARNING:absl:Dereferenced None during placeholder evaluation. Ignoring.
WARNING:absl:Placeholder=key: "url"

INFO:absl:Container spec: {'image': 'google/cloud-sdk:278.0.0', 'command': ['sh', '-exc', '\n          url="$0"\n          output_data_uri="$1"/data  # TODO(b/150515270) Remove when fixed.\n          output_data_path=$(mktemp)\n\n          # Running the main code\n          wget "$0" -O "$output_data_path" || curl "$0" > "$output_data_path"\n\n          # Getting data out of the container\n          gsutil cp "$output_data_path" "$output_data_uri"\n        ', None, '/Users/codesue/tfx_root/pipelines/download-grep-print-pipelin/DownloadFromHttp/data/9'], 'args': []}
INFO:absl:Docker platform config: {}
INFO:absl:Docker: + url=

INFO:absl:Docker: + output_data_uri=/Users/codesue/tfx_root/pipelines/download-grep-print-pipelin/DownloadFromHttp/data/9/data

INFO:absl:Docker: + mktemp

INFO:absl:Docker: + output_data_path=/tmp/tmp.ApsavivuJH

INFO:absl:Docker: + wget  -O /tmp/tmp.ApsavivuJH

INFO:absl:Docker: : 7: : wget: not found

INFO:absl:Docker: + curl 

INFO:absl:Docker: curl: (3) URL using bad/illegal format or missing URL

...

RuntimeError: Container exited with error code "3" [while running 'Run[DownloadFromHttp]']
chongkong commented 3 years ago

Sorry I'm not reproducing the error (I used python 3.7.7 and tfx==0.30.1 in macos 11.0.4). I have no idea why this is failing. @kennethyang404 Do you have any clue from the error log?

google-ml-butler[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No