tensorflow / tfx

TFX is an end-to-end platform for deploying production ML pipelines
https://tensorflow.github.io/tfx/
Apache License 2.0
2.12k stars 711 forks source link

Evaluator Module_File install_to_temp_directory Failure #6920

Closed adammkerr closed 1 month ago

adammkerr commented 1 month ago

System information

absl-py==1.4.0
annotated-types==0.7.0
anyio==4.4.0
apache-beam==2.58.1
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
astunparse==1.6.3
async-lru==2.0.4
async-timeout==4.0.3
attrs==23.2.0
babel==2.16.0
backcall==0.2.0
beautifulsoup4==4.12.3
bleach==6.1.0
cachetools==5.5.0
certifi==2024.7.4
cffi==1.17.0
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==2.2.1
colorama==0.4.6
comm==0.2.2
crcmod==1.7
debugpy==1.8.5
decorator==5.1.1
defusedxml==0.7.1
Deprecated==1.2.14
dill==0.3.1.1
dnspython==2.6.1
docker==4.4.4
docopt==0.6.2
docstring_parser==0.16
exceptiongroup==1.2.2
fastavro==1.9.5
fasteners==0.19
fastjsonschema==2.20.0
fire==0.6.0
flatbuffers==24.3.25
fqdn==1.5.1
gast==0.6.0
google-api-core==2.19.2
google-api-python-client==1.12.11
google-apitools==0.5.31
google-auth==2.34.0
google-auth-httplib2==0.2.0
google-auth-oauthlib==1.2.1
google-cloud-aiplatform==1.64.0
google-cloud-bigquery==3.25.0
google-cloud-bigquery-storage==2.25.0
google-cloud-bigtable==2.26.0
google-cloud-core==2.4.1
google-cloud-datastore==2.20.1
google-cloud-dlp==3.22.0
google-cloud-language==2.14.0
google-cloud-pubsub==2.23.0
google-cloud-pubsublite==1.11.1
google-cloud-recommendations-ai==0.10.12
google-cloud-resource-manager==1.12.5
google-cloud-spanner==3.48.0
google-cloud-storage==2.18.2
google-cloud-videointelligence==2.13.5
google-cloud-vision==3.7.4
google-crc32c==1.5.0
google-pasta==0.2.0
google-resumable-media==2.7.2
googleapis-common-protos==1.65.0
grpc-google-iam-v1==0.13.1
grpc-interceptor==0.15.4
grpcio==1.66.0
grpcio-status==1.48.2
h11==0.14.0
h5py==3.11.0
hdfs==2.7.3
httpcore==1.0.5
httplib2==0.22.0
httpx==0.27.2
idna==3.8
ipykernel==6.29.5
ipython==7.34.0
ipython-genutils==0.2.0
ipywidgets==7.8.3
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.4
joblib==1.4.2
Js2Py==0.74
json5==0.9.25
jsonpickle==3.2.2
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter_client==8.6.2
jupyter_core==5.7.2
jupyter_server==2.14.2
jupyter_server_terminals==0.5.3
jupyterlab==4.2.5
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
jupyterlab_widgets==1.1.9
keras==2.15.0
keras-tuner==1.4.7
kfp==1.8.22
kfp-pipeline-spec==0.1.16
kfp-server-api==1.8.5
kt-legacy==1.0.5
kubernetes==12.0.1
libclang==18.1.1
lxml==5.3.0
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib-inline==0.1.7
mdurl==0.1.2
mistune==3.0.2
ml-dtypes==0.3.2
ml-metadata==1.15.0
ml-pipelines-sdk==1.15.1
nbclient==0.10.0
nbconvert==7.16.4
nbformat==5.10.4
nest-asyncio==1.6.0
nltk==3.9.1
notebook==7.2.2
notebook_shim==0.2.4
numpy==1.26.4
oauth2client==4.1.3
oauthlib==3.2.2
objsize==0.7.0
opt-einsum==3.3.0
orjson==3.10.7
overrides==7.7.0
packaging==24.1
pandas==1.5.3
pandocfilters==1.5.1
parso==0.8.4
pexpect==4.9.0
pickleshare==0.7.5
pillow==10.4.0
platformdirs==4.2.2
portalocker==2.10.1
portpicker==1.6.0
prometheus_client==0.20.0
prompt_toolkit==3.0.47
proto-plus==1.24.0
protobuf==3.20.3
psutil==6.0.0
ptyprocess==0.7.0
pyarrow==10.0.1
pyarrow-hotfix==0.6
pyasn1==0.6.0
pyasn1_modules==0.4.0
pycparser==2.22
pydantic==1.10.18
pydantic_core==2.20.1
pydot==1.4.2
pyfarmhash==0.3.2
Pygments==2.18.0
pyjsparser==2.7.1
pymongo==4.8.0
pyparsing==3.1.4
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
pytz==2024.1
PyYAML==6.0.2
pyzmq==26.2.0
redis==5.0.8
referencing==0.35.1
regex==2024.7.24
requests==2.31.0
requests-oauthlib==2.0.0
requests-toolbelt==0.10.1
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.8.0
rouge-score==0.1.2
rpds-py==0.20.0
rsa==4.9
sacrebleu==2.4.3
scipy==1.12.0
Send2Trash==1.8.3
shapely==2.0.6
shellingham==1.5.4
six==1.16.0
sniffio==1.3.1
soupsieve==2.6
sqlparse==0.5.1
strip-hints==0.1.10
tabulate==0.9.0
tensorboard==2.15.2
tensorboard-data-server==0.7.2
tensorflow==2.15.1
tensorflow-data-validation==1.15.1
tensorflow-estimator==2.15.0
tensorflow-hub==0.15.0
tensorflow-io-gcs-filesystem==0.37.1
tensorflow-metadata==1.15.0
tensorflow-serving-api==2.15.1
tensorflow-transform==1.15.0
tensorflow_model_analysis==0.46.0
termcolor==2.4.0
terminado==0.18.1
tfx==1.15.1
tfx-bsl==1.15.1
tinycss2==1.3.0
tomli==2.0.1
tornado==6.4.1
tqdm==4.66.5
traitlets==5.14.3
typer==0.12.5
types-python-dateutil==2.9.0.20240821
typing_extensions==4.12.2
tzlocal==5.2
uri-template==1.3.0
uritemplate==3.0.1
urllib3==1.26.19
wcwidth==0.2.13
webcolors==24.8.0
webencodings==0.5.1
websocket-client==1.8.0
Werkzeug==3.0.4
widgetsnbextension==3.6.8
wrapt==1.14.1
zstandard==0.23.0

Describe the current behavior I am attempting to provide a UDFs for Evaluator customization for Sklearn pipeline. Following example here: https://github.com/tensorflow/tfx/blob/master/tfx/examples/penguin/experimental/README.md

Providing a module_file to the Evaluator component causes the pipeline to fail. The failure in logs are:

2024-09-27 20:45:46.389
I0927 19:45:17.038211 132884941596480 executor.py:191] Using gs://prj-cxbi-dev-nane1-dsc-ttep-vertex-pipelines/tfx_pipeline_output/sklearn-penguin/973046326318/sklearn-penguin-20240927190054/Trainer_-7943956117518811136/model/Format-Serving as model.
2024-09-27 20:45:46.389
I0927 19:45:17.483053 132884941596480 executor.py:236] The 'example_splits' parameter is not set, using 'eval' split.
2024-09-27 20:45:46.389
I0927 19:45:17.483273 132884941596480 executor.py:239] Evaluating model.
2024-09-27 20:45:46.389
I0927 19:45:17.484107 132884941596480 udf_utils.py:340] Installing 'gs://prj-cxbi-dev-nane1-dsc-ttep-vertex-pipelines/tfx_pipeline_output/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+a73a01923eac3e23d9a8b55381b25fdb3e96843b6c686aff9fbec63879e9f5f9-py3-none-any.whl' to a temporary directory.
2024-09-27 20:45:46.389
I0927 19:45:17.484263 132884941596480 udf_utils.py:347] Executing: ['/opt/conda/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmpdwd04iqq', 'gs://prj-cxbi-dev-nane1-dsc-ttep-vertex-pipelines/tfx_pipeline_output/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+a73a01923eac3e23d9a8b55381b25fdb3e96843b6c686aff9fbec63879e9f5f9-py3-none-any.whl']
2024-09-27 20:45:46.389
WARNING: Requirement 'gs://prj-cxbi-dev-nane1-dsc-ttep-vertex-pipelines/tfx_pipeline_output/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+a73a01923eac3e23d9a8b55381b25fdb3e96843b6c686aff9fbec63879e9f5f9-py3-none-any.whl' looks like a filename, but the file does not exist
2024-09-27 20:45:46.389
Processing ./gs:/prj-cxbi-dev-nane1-dsc-ttep-vertex-pipelines/tfx_pipeline_output/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+a73a01923eac3e23d9a8b55381b25fdb3e96843b6c686aff9fbec63879e9f5f9-py3-none-any.whl
2024-09-27 20:45:46.389
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/pipeline/gs:/prj-cxbi-dev-nane1-dsc-ttep-vertex-pipelines/tfx_pipeline_output/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+a73a01923eac3e23d9a8b55381b25fdb3e96843b6c686aff9fbec63879e9f5f9-py3-none-any.whl'
2024-09-27 20:45:46.389
{levelname: ERROR}
2024-09-27 20:45:46.389
Traceback (most recent call last):
2024-09-27 20:45:46.389
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
2024-09-27 20:45:46.389
return _run_code(code, main_globals, None,
2024-09-27 20:45:46.390
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
2024-09-27 20:45:46.390
exec(code, run_globals)
2024-09-27 20:45:46.390
File "/opt/conda/lib/python3.10/site-packages/tfx/orchestration/kubeflow/v2/container/kubeflow_v2_run_executor.py", line 233, in <module>
2024-09-27 20:45:46.390
File "/opt/conda/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
2024-09-27 20:45:46.390
sys.exit(main(argv))
2024-09-27 20:45:46.390
File "/opt/conda/lib/python3.10/site-packages/tfx/orchestration/kubeflow/v2/container/kubeflow_v2_run_executor.py", line 229, in main
2024-09-27 20:45:46.390
_run_executor(args, beam_args)
2024-09-27 20:45:46.390
File "/opt/conda/lib/python3.10/site-packages/tfx/orchestration/kubeflow/v2/container/kubeflow_v2_run_executor.py", line 135, in _run_executor
2024-09-27 20:45:46.390
executor.Do(inputs, outputs, exec_properties)
2024-09-27 20:45:46.390
File "/opt/conda/lib/python3.10/site-packages/tfx/components/evaluator/executor.py", line 244, in Do
2024-09-27 20:45:46.390
with udf_utils.TempPipInstallContext(extra_pip_packages):
2024-09-27 20:45:46.390
File "/opt/conda/lib/python3.10/site-packages/tfx/components/util/udf_utils.py", line 307, in __enter__
2024-09-27 20:45:46.390
install_to_temp_directory(dependency, temp_dir=self.temp_directory)
2024-09-27 20:45:46.390
File "/opt/conda/lib/python3.10/site-packages/tfx/components/util/udf_utils.py", line 348, in install_to_temp_directory
2024-09-27 20:45:46.390
subprocess.check_call(install_command)
2024-09-27 20:45:46.390
File "/opt/conda/lib/python3.10/subprocess.py", line 369, in check_call
2024-09-27 20:45:46.390
raise CalledProcessError(retcode, cmd)
2024-09-27 20:45:46.390
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmpdwd04iqq', 'gs://prj-cxbi-dev-nane1-dsc-ttep-vertex-pipelines/tfx_pipeline_output/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+a73a01923eac3e23d9a8b55381b25fdb3e96843b6c686aff9fbec63879e9f5f9-py3-none-any.whl']' returned non-zero exit status 1.
2024-09-27 20:48:39.515
Finished tearing down training program.
2024-09-27 20:48:39.748
Job failed.

Describe the expected behavior The Evaluator component to properly install the wheel from the GCS bucket.

Standalone code to reproduce the issue

My files are arranged as such:

- my_pipeline (the name of my pipeline)
     - data
     - models
          - model
              -  __init__.py
              - constants.py
              - model.py
              - predict_extractor.py
              - tuner.py
          - __init__.py
          - features.py
          - features.py
          - preprocessing.py
          - query.sql
     - pipeline
          - __init__.py
          - configs.py
          - pipeline.py
     - __init__.py
     - Dockerfile
     - kubeflow_v2_runner.py
     - Makefile
     - requirements.txt 

Dockerfile

FROM tensorflow/tfx:1.15.1
WORKDIR /pipeline
COPY ./ ./
ENV PYTHONPATH="/pipeline:${PYTHONPATH}"

configs.py:

# Pipeline Name / Model Name will be used to identify this pipeline
PIPELINE_NAME = 'sklearn-penguin'
PROGRAM_ID = 'tfx-poc-pipelines'
GOOGLE_CLOUD_PROJECT = 'prj-cxbi-dev-nane1-dsc-ttep'
GOOGLE_CLOUD_REGION = 'northamerica-northeast1'
GCS_BUCKET_NAME = GOOGLE_CLOUD_PROJECT + '-vertex-pipelines'

_OUTPUT_DIR = os.path.join('gs://', configs.GCS_BUCKET_NAME)
_PIPELINE_ROOT = os.path.join(_OUTPUT_DIR, 'tfx_pipeline_output', configs.PIPELINE_NAME)

# Evaluator component needs a custom extractor in order to make predictions for non-Tensorflow Models
# Provide path the predict_extractor.py module file which contains the extractor logic (if applicable)
#EVALUATOR_MODULE_FILE = None
EVALUATOR_MODULE_FILE = os.path.join('models', 'model', 'predict_extractor.py')

kubeflow_v2_runner.py:

import os
from absl import logging

from pipeline import configs, pipeline
from tfx.orchestration.kubeflow.v2 import kubeflow_v2_dag_runner
from tfx.proto import trainer_pb2

def run():

  runner_config = kubeflow_v2_dag_runner.KubeflowV2DagRunnerConfig(
          default_image=configs.PIPELINE_IMAGE)

  dsl_pipeline = pipeline.create_pipeline(**args)

  runner = kubeflow_v2_dag_runner.KubeflowV2DagRunner(config=runner_config)
  runner.run(pipeline=dsl_pipeline)

if __name__ == '__main__':
    logging.set_verbosity(logging.INFO)
    run()

Component init in pipeline.py:

import os
import sys
import logging
import json

from typing import Any, Callable, Dict, List, Optional, Tuple, Union, Text

from tfx.components.example_gen.custom_executors import avro_executor, parquet_executor

import tensorflow_model_analysis as tfma
from tfx import v1 as tfx

from ml_metadata.proto import metadata_store_pb2

def create_pipeline(**args):

  components = []

  example_gen = tfx.components.CsvExampleGen(input_base=data_path)
  components.append(example_gen )

  trainer = tfx.components.Trainer(**trainer_args)
  components.append(trainer)

  model_resolver = tfx.dsl.Resolver(
      strategy_class=tfx.dsl.experimental.LatestBlessedModelStrategy,
      model=tfx.dsl.Channel(type=tfx.types.standard_artifacts.Model),
      model_blessing=tfx.dsl.Channel(
          type=tfx.types.standard_artifacts.ModelBlessing)).with_id(
              'latest_blessed_model_resolver')

  # append to components list
  components.append(model_resolver)

  evaluator = tfx.components.Evaluator(
          module_file=evaluator_module_file,
          examples=example_gen.outputs["examples"],
          model=trainer.outputs["model"],
          baseline_model=model_resolver.outputs["model"],
          eval_config=eval_configs,
      )
  components.append(evaluator)

  return tfx.dsl.Pipeline(
          pipeline_name=pipeline_name,
          pipeline_root=pipeline_root,
          components=components,
          # Change this value to control caching of execution results. Default value is `False`.
          enable_cache=True,
          metadata_connection_config=metadata_connection_config
  )

Providing a bare minimum test case or step(s) to reproduce the problem will greatly help us to debug the issue. If possible, please share a link to Colab/Jupyter/any notebook.

  1. I compile the pipeline using the tfx cli: tfx pipeline update --pipeline_path=kubeflow_v2_runner.py --engine=vertex --build_image
  2. This creates my Dockefile with my custom logic and the wheel file for my evalutor extractor, as shown in this picture: image
  3. I then submit the pipeline to vertex pipelines, which successfully creates a DAG: image
  4. Everything runs fine until the Evaluator, which produces the following logs: downloaded-logs-20240930-132734.json

Ironically, the only reference I could find to this was someone having the same issue here: https://github.com/tensorflow/tfx/issues/3761#issuecomment-884941242 which was an issue back in 0.30.0, where no resolution was provided.

adammkerr commented 1 month ago

Some supplementary information.

I ran the pipeline locally using LocalDagRunner() and confirm that the pipeline works as intended. The LocalDagRunner creates the wheel, writes it locally to my file system, launches the pipeline, and retrieves / installs the wheel as intended.

Providing a bare minimum test case or step(s) to reproduce

  1. In my compute engine I ran the exact same code as above, however substituting kubeflow_v2_dag_runner.KubeflowV2DagRunner for tfx.orchestration.LocalDagRunner
  2. The logs from the Evaluator component are provided here:
INFO:absl:Evaluating model.
INFO:absl:Installing '/home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/home/adam.kerr/miniconda3/envs/tfx_114/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmp8uftiauw', '/home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl']
I0000 00:00:1728331168.955670  185951 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
Processing /home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl
Installing collected packages: tfx-user-code-Evaluator
Successfully installed tfx-user-code-Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45
INFO:absl:Successfully installed '/home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl'.
INFO:absl:udf_utils.get_fn {'eval_config': '{\n  "metrics_specs": [\n    {\n      "metrics": [\n        {\n          "class_name": "Accuracy",\n          "threshold": {\n            "change_threshold": {\n              "absolute": -1e-10,\n              "direction": "HIGHER_IS_BETTER"\n            },\n            "value_threshold": {\n              "lower_bound": 0.6\n            }\n          }\n        }\n      ]\n    }\n  ],\n  "model_specs": [\n    {\n      "label_key": "species"\n    }\n  ],\n  "slicing_specs": [\n    {}\n  ]\n}', 'example_splits': 'null', 'fairness_indicator_thresholds': 'null', 'module_path': 'predict_extractor@/home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl'} 'custom_extractors'
INFO:absl:Installing '/home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/home/adam.kerr/miniconda3/envs/tfx_114/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmpfhem5lvq', '/home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl']
I0000 00:00:1728331170.640102  185951 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
Processing /home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl
Installing collected packages: tfx-user-code-Evaluator
Successfully installed tfx-user-code-Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45
INFO:absl:Successfully installed '/home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl'.
INFO:absl:eval_shared_models have model_types: {'tf_generic'}
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
INFO:absl:Evaluation complete. Results written to /home/adam.kerr/tfx/pipelines/sklearn-penguin/Evaluator/evaluation/42.
INFO:absl:Checking validation results.
WARNING:tensorflow:From /home/adam.kerr/miniconda3/envs/tfx_114/lib/python3.8/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:111: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From /home/adam.kerr/miniconda3/envs/tfx_114/lib/python3.8/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:111: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
INFO:absl:Blessing result True written to /home/adam.kerr/tfx/pipelines/sklearn-penguin/Evaluator/blessing/42.
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 42 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'blessing': [Artifact(artifact: uri: "/home/adam.kerr/tfx/pipelines/sklearn-penguin/Evaluator/blessing/42"
, artifact_type: name: "ModelBlessing"
)], 'evaluation': [Artifact(artifact: uri: "/home/adam.kerr/tfx/pipelines/sklearn-penguin/Evaluator/evaluation/42"
, artifact_type: name: "ModelEvaluation"
)]}) for execution 42
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component Evaluator is finished.

This leads me to believe either one of the following:

  1. My process is broken and I am compiling my pipeline incorrectly for KubeflowV2DagRunner. I am missing a step, or some code, which allows the Evaluator component to create the UDF wheel retrieve it from a GCS bucket pipeline root properly.
  2. There is a bug in kubeflow_v2_dag_runner.KubeflowV2DagRunner which does not create the wheel and dependencies correctly for the Evaluator component.
  3. There is missing functionality / support in udf_utils.py for retrieving wheels from a GCS location. As proved in my experiment above, the Evaluator component can successfully create and retrieve UDF wheels from a local file system using LocalDagRunner, but cannot successfully create and retrieve UDF wheel from a GCS bucket using KubeflowV2DagRunner.

Really looking for any guidance here. Thank you in advance.

janasangeetha commented 1 month ago

Hi @adammkerr, Thank you for reporting. I'll investigate and provide an update here.

nikelite commented 1 month ago

This is likely a bug in the UDF util implementation. It should use local_pip_package_path to install pip when the path refers to a remote file system.

https://github.com/tensorflow/tfx/blob/c08360b3525a1d1af8d267933cc06f24af686dff/tfx/components/evaluator/executor.py#L123

However, the UDF util currently uses the original path instead.

https://github.com/tensorflow/tfx/blob/c08360b3525a1d1af8d267933cc06f24af686dff/tfx/components/evaluator/executor.py#L244

janasangeetha commented 1 month ago

Hi @adammkerr, The changes are merged. Please run the pipeline and let us know if you need any help! Thank you

adammkerr commented 1 month ago

I can confirm that it works! Thank you @janasangeetha @nikelite, much much appreciated!

I see this has been merged into the master branch, will this change be included in the next official SDK / official container release (assuming it would be 1.16.0)?

janasangeetha commented 1 month ago

Hi @adammkerr, Thanks for confirming! Yes the changes will reflect in the next official release.

adammkerr commented 1 month ago

Lovely! Thanks again everyone, much appreciated!

github-actions[bot] commented 1 month ago

Are you satisfied with the resolution of your issue? Yes No