Closed strangemonad closed 2 years ago
Hello @strangemonad and thank you for reporting this issue !
There seems to be some left-over confusion about what zenml stack up
really does. Traditionally, this command was used exclusively to provision resources for local stack components, like the local k3d cluster and kubeflow deployment for the kubeflow orchestrator. However, this has changed with more recent ZenML versions to cover use-cases that connect directly to remote services, like the kubeflow orchestrator in your case.
Even when zenml stack up
doesn't provision local resources, you still have to run it in some cases to forward remote ports locally. This is the case here with the kubeflow metadata store: you have to run zenml stack up
to forward the remote gRPC metadata-store port locally via a kubectl port-forward
command, otherwise you won't be able to access the metadata store in the post-execution workflow.
In the case of the kubeflow orchestrator component, there are some configuration attributes that you can tweak to completely remove the need to forward ports locally and to connect directly to the remote ports (see this issue for more info).
We could implement a similar logic for the kubeflow metadata store:
skip_metadata_daemon_provisioning
bool attribute that you can set to remove the need to run zenml stack up
host
stack component attribute is set to something other than the default localhost value and skip forwarding the port locally in that case (which again, removes the need to run zenml stack up
)Please let me know if you think that would address your use-case.
Isnt this also related somehow to #728 ? Perhaps @VictorW96 can confirm this is the behavior he sees?
@stefannica interesting and I see the reasoning. I think this might need more nuance though. @RoyerRamirez and I are preparing a more comprehensive writeup of all the rough edges we've run into getting a pipeline working against an AWS Kubeflow deployment.
For this one in particular, I think there needs to be a way to have it both ways.
Sorry to butt into the conversation, but @strangemonad you might appreciate our new repo here: https://github.com/zenml-io/mlops-stacks
It allows you to quickly get a cloud based stack running with some opinionated configuration. We also have a rehaul of the docs coming up with more focus on the cloud stuff.
It isnt finalized yet and we have not really launched it, but the goal would also be to link these stack recipes to zenml stack
somehow. WDYT?
@stefannica interesting and I see the reasoning. I think this might need more nuance though. @RoyerRamirez and I are preparing a more comprehensive writeup of all the rough edges we've run into getting a pipeline working against an AWS Kubeflow deployment.
To say that I'm really looking forward to reading it would be an understatement :smile:
@stefannica @htahir1 @RoyerRamirez here are the rough notes establishing the context of what we're trying to setup and the roadblocks we hit. https://notes.strangemonad.com/Zenml+stack+setup+thoughts still rough but hopefully sketches enough of an outline.
@htahir1 I had seen the repo. Setting up the infrastructure with terraform isn't our roadblock (though I might suspect it is for many that don't have in-house DevOps and k8s expertise). We have a functional Kubeflow stack and we're trying to target that for the time being. There's a lot in the way of ML metadata tracking, visualization and relative maturity with KFP that we're not willing to step away from yet in favor of, say, the zenml k8s orchestrator until that's more mature (e.g. how can I run dynamic conditional steps or parallel steps controlling for max-concurrrency using results from a previous step)
Hey @stefannica @RoyerRamirez and I are still experiencing this issue (btw @strangemonad explained it in details in his notes above); I am wondering if this is already on your radar.
@amirhessam88 With the new changes we are undergoing this issue will resolve itself. For now maybe one of @fa9r @stefannica or @schustmi can help?
With the new release, there is no metadata store, so I would ask @strangemonad to close the issue if they think its fine? :-)
@htahir1 Thanks Hamza! We have not had a chance to test out the new version and pinned our work at v0.13.2
. Shawn and I have plans to try it out and see what part of our work should be changed accordingly. I have seen that you guys have some recipes in the docs. One question I have is, do you think another release which might have some breaking changes might come soon. I think I read somewhere you are pushing on release v1.0.0
. and feel free to close the issue. Thanks
@amirhessam88 We are racing towards 1.0.0! Cant promise a date yet but I would say the biggest changes are behind us with the architecture change. Maybe some stuff will change around database schemas when we drop MLMD as a dependency and secret managers might move out of the stack, but that is all migratable. I will close the issue now - let us know how your upgrade goes!
Contact Details [Optional]
shawnmorel@gmail.com
System Information
ZenML version: 0.10.0 Install path: /home/jovyan/factory-data-algorithms/projects/common/process-health/.venv/lib/python3.9/site-packages/zenml Python version: 3.9.9 Platform information: {'os': 'linux', 'linux_distro': 'ubuntu', 'linux_distro_like': 'debian', 'linux_distro_version': '20.04'} Environment: docker Integrations: ['aws', 'kubeflow', 's3', 'scipy', 'seldon', 'slack']
What happened?
When targeting a remote kubeflow pipelines deployment with a kubeflow orchestrator and kubeflow metadata-store,
stack.deploy_pipeline()
checksif not component.is_running
for each component. The kubeflow orchestrator correctly determines that it's not running locally and returnsTrue
but the kubeflow metadata store reports that it's not running (we don't want tozenml stack up
to run a local kfp metadata store, we just want to deploy the pipeline and letinside_kfp_pod
do it's magic when resolvingget_tfx_metadata_config
Reproduction steps
No response
Relevant log output
Code of Conduct