Open axeltidemann opened 3 years ago
@axeltidemann, this issue seems relevant to Kubeflow. Please raise this in Kubeflow issues. Confirm once , if this seems okay. Thanks.
@arghyaganguly But I see mlpipeline-ui-metadata
showing up automatically in the KubeFlow UI, which also is coming from TFX (see https://github.com/tensorflow/tfx/blob/e0cb043ff5d3a9fc33f20b1ce6348518e68352ff/tfx/orchestration/kubeflow/base_component.py). Given that TFX is built on top of KubeFlow and I am using a TFX custom component, it must be a TFX relevant issue, no? How would the KubeFlow team know TFX specific questions? (There could be some overlap of course, I am happy to stand corrected.)
Sorry to bother you @jiyongjung0, but I'd really appreciate your input when you have the time. Thanks.
I'm sorrry for late response. I'm not very familiar with Kubeflow stuff and was finding a better person to respond. @neuromage could you give some help on this issue?
It seems mlpipeline-metrics
does not get propagated at all, if so it would have been added to the output_artifact_paths
dictionary: https://github.com/tensorflow/tfx/blob/e0cb043ff5d3a9fc33f20b1ce6348518e68352ff/tfx/orchestration/kubeflow/base_component.py#L131
In addition, it should have been dealt with in the container entry point, like mlpipeline-ui-metadata
: https://github.com/tensorflow/tfx/blob/511763835e8f982ecb05f31be3903040179f3968/tfx/orchestration/kubeflow/container_entrypoint.py#L291
Is there a specific reason for this omission? Or maybe something for a pull request?
I tried to make changes to the source code of TFX itself (following the instructions here), where I basically implemented the changes above, i.e.
output_artifact_paths={
'mlpipeline-ui-metadata': '/mlpipeline-ui-metadata.json',
'mlpipeline-metrics': '/mlpipeline-metrics.json'
}
in tfx/tfx/orchestration/kubeflow/base_component.py
and also hardcoded metrics and file output in tfx/tfx/orchestration/kubeflow/container_entrypoint.py
like so:
metrics = {
'metrics':[
{
'name': 'RMSE-validation',
'numberValue': 777.77,
'format': 'RAW'
}
]
}
with open('/mlpipeline-metrics.json', 'w') as _file:
json.dump(metrics, _file)
This was still not picked up by the KubeFlow UI. I assume there are some deeper changes needed, then. Maybe @neuromage can shed some light on this?
Hi @axeltidemann, those changes look correct to me.
/cc @numerology and @chensun, any ideas why the above may not be working?
Changing output_artifact_paths
in base_component.py
should suffice. If that is not picked up by the UI then it seems like a bug to me.
May I ask which KFP version are you using (both SDK and deployment)?
Good question. I don't specify which KFP version to use in deployment, I use the tfx CLI. It was my assumption that it creates a docker image from my local installation and uploads that to eu.gcr.io and therefore would use my local KFP version, but I can't figure out how to determine which KFP version is actually used on the cluster. Is there a way to find that out?
These are my local versions, in any case:
>python
Python 3.7.9 (default, Nov 20 2020, 18:45:38)
[Clang 12.0.0 (clang-1200.0.32.27)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tfx
>>> tfx.__version__
'0.28.0.dev'
>>> import kfp
>>> kfp.__version__
'1.3.0'
but I can't figure out how to determine which KFP version is actually used on the cluster. Is there a way to find that out?
If you have access to KFP UI, it's shown in the bottom left corner, for instance:
or if you have kubectl
connected to your cluster, you can describe any KFP pod, for example:
kubectl describe pod ml-pipeline-76fddff986-h7hsh -n kubeflow
and the container image label (1.2.0 shown in the below output) is the KFP backend version.
Containers:
ml-pipeline-api-server:
Container ID: docker://a84dc475d6b6fb6e9dc58204e58e6c606498239f38fa12145e93953458bdd045
Image: gcr.io/ml-pipeline/api-server:1.2.0
Thanks, @chensun. The version displayed is indeed 1.0.4, and the container image label is in the YAML file in the KubeFlow UI: ml-pipeline-api-server: gcr.io/cloud-marketplace/google-cloud-ai-platform/kubeflow-pipelines/apiserver:1.0.4
.
However, could it be that local changes I make to TFX are not packaged and uploaded to the KubeFlow cluster in any case?
@numerology I suppose I should create a separate Docker image with my changes to TFX, push that to Docker hub, and make the tfx cli
use that image. I see when running
tfx pipeline update --pipeline-path=kubeflow_runner.py --endpoint=$ENDPOINT
that the tensorflow/tfx:0.25.0
image is used:
[truncated]
[Skaffold] #3 [internal] load metadata for docker.io/tensorflow/tfx:0.25.0
[Skaffold] #3 sha256:0de1d35ca0abce93f6f1d57543269f062bb56777e77abd8be41593a801cd2d61
[Skaffold] #3 DONE 2.8s
[Skaffold]
[Skaffold] #7 [1/3] FROM docker.io/tensorflow/tfx:0.25.0@sha256:0700c27c6492b8b2998e7d543ca13088db8d40ef26bd5c6eec58245ff8cdec35
[Skaffold] #7 sha256:8e5e2c00eb5ed31ca14860fd9aa40e783fe78ad12be31dc9da89ddad19876dc9
[Skaffold] #7 DONE 0.0s
[truncated]
However, I cannot figure out where to set which Docker image to use. I have even tried searching the repository for load metadata for
, but no results came up. Any ideas?
@axeltidemann Indeed, in order to do that I believe you'll need to specify the base image when running the CLI command. For example:
tfx pipeline create --pipeline-path=kubeflow_runner.py --endpoint=$ENDPOINT --build_base_image your-docker-hub-repo/your-tfx-image --build_target_image your-docker-hub-repo/your-image-for-this-pipeline
Also please refer to the help message for --build_target_image
option in https://github.com/tensorflow/tfx/blob/HEAD/tfx/tools/cli/commands/pipeline.py for advanced image building options.
I wanted to mention how important getting Kubeflow metrics into TFX is for my team. I curious if this is no longer an issue in the kubeflow v2 runner? I haven't been able to try it out.
@easadler
Kubeflow v2 runner is still being developed. Currently it only compiles TFX DSL objects into KFP IR spec. The story of visualization in Kubeflow v2 runner is being discussed.
/cc @neuromage
@numerology I was able to create a custom build-base-image
of TFX with the changes I referenced above:
./tfx/tools/docker/build_docker_image.sh
)eu.gcr.io/my-project/custom-tfx-image
), pushed it to GCR.tfx pipeline create --engine kubeflow --build-target-image eu.gcr.io/my-project/my-tfx-pipeline --build-base-image eu.grc.io/my-project/custom-tfx-image --endpoint $ENDPOINT --pipeline-path kubeflow_runner.py
[Skaffold] Step 1/4 : FROM eu.gcr.io/my-project/custom-tfx-image
when creating the pipeline.container_entrypoint.py
after writing it, so I am sure it is successfully written.mlpipeline-metrics
in the KubeFlow UI. SDK KFP version is 1.4, and KubeFlow (deployment) is 1.0.4, could this be an issue? It was my understanding that the KubeFlow Pipelines (1.4) and deployment of KubeFlow on Kubernetes (1.0.4) are two different things, and that the version number comparison is meaningless (but please correct me if I am wrong). Do you have any other ideas?
@neuromage maybe you have some ideas why the above approach does not work?
@neuromage @numerology sorry to bother you again, but do you have any thoughts on this?
@neuromage @numerology @axeltidemann What is the status on this? Is it possible to export metrics and custom metadata with TFX in KubeFlow nowadays?
No progress from my side, when I have time I'd like to re-try the suggestions I outlined above, just to verify.
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
I still haven't had the time, but I'd very much like to keep this issue open.
I want to export
mlpipeline-metrics
from my custom Python function TFX component so that it is displayed in the KubeFlow UI, as described here: https://www.kubeflow.org/docs/pipelines/sdk/pipelines-metrics/This is a minimal example of what I am trying to do:
In the KubeFlow UI, the "Run output" tab says "No metrics found for this run." However, the output artefact shows up in the
ML MetaData
(see screenshot). Any help on how to accomplish this would be greatly appreciated. Thanks!