Closed valeriano-manassero closed 4 years ago
Unfortunately that's a known issue. Kubeflow full-fledge deployment does not have the right MLMD config to use gRPC as in TFX 0.21.0. There are two solution to this issue:
Can you try a standalone KFP deployment (this is the only thing you need to run TFX pipeline, if you do not use Kubeflow notebook, katib and so on) with version >= 0.2.1? You can find deploy instruction here
We can work out a kubeflow_metadata_config that works with full fledge kubeflow deployment, might take 1 or 2 days.
Hi @numerology and ty for answer. Unfortunately Katib is a requirement for this testing deployment so I can't avoid it. Atm I'm not sure I have enough time to deep dive into code to issue a PR. Will try if you will not have an implementation before.
Does this block the use of tfx with Kubeflow Pipelines only on local clusters, or also on GCP etc.?
Could you guys perhaps give an indication on the priority of this issue? It would certainly help with decisions going forward on the use of tfx with kubeflow and considering possible alternatives. Many thanks!
To solve the issue, you should change the configuration:
metadata_config = kubeflow_dag_runner.get_default_kubeflow_metadata_config()
metadata_config.mysql_db_service_host.value = 'mysql.kubeflow'
metadata_config.mysql_db_service_port.value = "3306"
metadata_config.mysql_db_name.value = "metadb"
metadata_config.mysql_db_user.value = "root"
metadata_config.mysql_db_password.value = ""
metadata_config.grpc_config.grpc_service_host.value ='metadata-grpc-service'
metadata_config.grpc_config.grpc_service_port.value ='8080'
runner_config = kubeflow_dag_runner.KubeflowDagRunnerConfig(
kubeflow_metadata_config=metadata_config
)
To solve the issue, you should change the configuration:
metadata_config = kubeflow_dag_runner.get_default_kubeflow_metadata_config() metadata_config.mysql_db_service_host.value = 'mysql.kubeflow' metadata_config.mysql_db_service_port.value = "3306" metadata_config.mysql_db_name.value = "metadb" metadata_config.mysql_db_user.value = "root" metadata_config.mysql_db_password.value = "" metadata_config.grpc_config.grpc_service_host.value ='metadata-grpc-service' metadata_config.grpc_config.grpc_service_port.value ='8080' runner_config = kubeflow_dag_runner.KubeflowDagRunnerConfig( kubeflow_metadata_config=metadata_config )
I can confirm this workaround is good for Kubeflow 1.0 on premise. ty!
After some testing I see grpc config should be enough, at least I didn't notice any issues with this:
metadata_config = kubeflow_dag_runner.get_default_kubeflow_metadata_config()
metadata_config.grpc_config.grpc_service_host.value ='metadata-grpc-service'
metadata_config.grpc_config.grpc_service_port.value ='8080'
runner_config = kubeflow_dag_runner.KubeflowDagRunnerConfig(
kubeflow_metadata_config=metadata_config
)
I'd like to add that if your pod is running in a different namespace, you need to append the namespace of the grpc backend to the grpc host name:
metadata_config = kubeflow_dag_runner.get_default_kubeflow_metadata_config() metadata_config.grpc_config.grpc_service_host.value ='metadata-grpc-service.kubeflow' metadata_config.grpc_config.grpc_service_port.value ='8080'
runner_config = kubeflow_dag_runner.KubeflowDagRunnerConfig( kubeflow_metadata_config=metadata_config )
For instance.
Kubernetes: 1.15 Kubeflow: 1.0RC4 TFX: 0.21.0
While testing
I got:
While in the past TFX versions I had issues described in https://github.com/tensorflow/tfx/issues/1002 , now TFX is getting metadata config via grpc but it's not getting the configs expected (maybe Kubeflow new version is also involved).