Closed adammkerr closed 1 month ago
Some supplementary information.
I ran the pipeline locally using LocalDagRunner() and confirm that the pipeline works as intended. The LocalDagRunner creates the wheel, writes it locally to my file system, launches the pipeline, and retrieves / installs the wheel as intended.
Providing a bare minimum test case or step(s) to reproduce
kubeflow_v2_dag_runner.KubeflowV2DagRunner
for tfx.orchestration.LocalDagRunner
INFO:absl:Evaluating model.
INFO:absl:Installing '/home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/home/adam.kerr/miniconda3/envs/tfx_114/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmp8uftiauw', '/home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl']
I0000 00:00:1728331168.955670 185951 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
Processing /home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl
Installing collected packages: tfx-user-code-Evaluator
Successfully installed tfx-user-code-Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45
INFO:absl:Successfully installed '/home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl'.
INFO:absl:udf_utils.get_fn {'eval_config': '{\n "metrics_specs": [\n {\n "metrics": [\n {\n "class_name": "Accuracy",\n "threshold": {\n "change_threshold": {\n "absolute": -1e-10,\n "direction": "HIGHER_IS_BETTER"\n },\n "value_threshold": {\n "lower_bound": 0.6\n }\n }\n }\n ]\n }\n ],\n "model_specs": [\n {\n "label_key": "species"\n }\n ],\n "slicing_specs": [\n {}\n ]\n}', 'example_splits': 'null', 'fairness_indicator_thresholds': 'null', 'module_path': 'predict_extractor@/home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl'} 'custom_extractors'
INFO:absl:Installing '/home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/home/adam.kerr/miniconda3/envs/tfx_114/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmpfhem5lvq', '/home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl']
I0000 00:00:1728331170.640102 185951 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
Processing /home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl
Installing collected packages: tfx-user-code-Evaluator
Successfully installed tfx-user-code-Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45
INFO:absl:Successfully installed '/home/adam.kerr/tfx/pipelines/sklearn-penguin/_wheels/tfx_user_code_Evaluator-0.0+0fa9fc021b3711ba58ec79a2e66725c6fda2ecbc953174b39fdd08d45850ce45-py3-none-any.whl'.
INFO:absl:eval_shared_models have model_types: {'tf_generic'}
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
INFO:absl:Evaluation complete. Results written to /home/adam.kerr/tfx/pipelines/sklearn-penguin/Evaluator/evaluation/42.
INFO:absl:Checking validation results.
WARNING:tensorflow:From /home/adam.kerr/miniconda3/envs/tfx_114/lib/python3.8/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:111: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From /home/adam.kerr/miniconda3/envs/tfx_114/lib/python3.8/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:111: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
INFO:absl:Blessing result True written to /home/adam.kerr/tfx/pipelines/sklearn-penguin/Evaluator/blessing/42.
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 42 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'blessing': [Artifact(artifact: uri: "/home/adam.kerr/tfx/pipelines/sklearn-penguin/Evaluator/blessing/42"
, artifact_type: name: "ModelBlessing"
)], 'evaluation': [Artifact(artifact: uri: "/home/adam.kerr/tfx/pipelines/sklearn-penguin/Evaluator/evaluation/42"
, artifact_type: name: "ModelEvaluation"
)]}) for execution 42
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component Evaluator is finished.
This leads me to believe either one of the following:
KubeflowV2DagRunner.
I am missing a step, or some code, which allows the Evaluator component to create the UDF wheel retrieve it from a GCS bucket pipeline root properly. kubeflow_v2_dag_runner.KubeflowV2DagRunner
which does not create the wheel and dependencies correctly for the Evaluator component. Really looking for any guidance here. Thank you in advance.
Hi @adammkerr, Thank you for reporting. I'll investigate and provide an update here.
This is likely a bug in the UDF util implementation. It should use local_pip_package_path to install pip when the path refers to a remote file system.
However, the UDF util currently uses the original path instead.
Hi @adammkerr, The changes are merged. Please run the pipeline and let us know if you need any help! Thank you
I can confirm that it works! Thank you @janasangeetha @nikelite, much much appreciated!
I see this has been merged into the master branch, will this change be included in the next official SDK / official container release (assuming it would be 1.16.0)?
Hi @adammkerr, Thanks for confirming! Yes the changes will reflect in the next official release.
Lovely! Thanks again everyone, much appreciated!
System information
pip freeze
output):Describe the current behavior I am attempting to provide a UDFs for Evaluator customization for Sklearn pipeline. Following example here: https://github.com/tensorflow/tfx/blob/master/tfx/examples/penguin/experimental/README.md
Providing a module_file to the Evaluator component causes the pipeline to fail. The failure in logs are:
Describe the expected behavior The Evaluator component to properly install the wheel from the GCS bucket.
Standalone code to reproduce the issue
My files are arranged as such:
Dockerfile
configs.py:
kubeflow_v2_runner.py:
Component init in pipeline.py:
Providing a bare minimum test case or step(s) to reproduce the problem will greatly help us to debug the issue. If possible, please share a link to Colab/Jupyter/any notebook.
tfx pipeline update --pipeline_path=kubeflow_v2_runner.py --engine=vertex --build_image
Ironically, the only reference I could find to this was someone having the same issue here: https://github.com/tensorflow/tfx/issues/3761#issuecomment-884941242 which was an issue back in 0.30.0, where no resolution was provided.