kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.61k stars 1.63k forks source link

Is there any way we can specify on which namespace my pipeline can run ? #2858

Closed hemantha-kumara closed 4 years ago

hemantha-kumara commented 4 years ago

What happened: by default, we use kfp.client or dsl.compile, generated workflow yaml is without metadata.namespace. due to this pipeline always runs in default kubeflow namespace.

What did you expect to happen: generated workflow yaml should contain namespace specified by user. compiler.compile can take new argument if required.

What steps did you take:

if __name__ == '__main__':
    import kfp.compiler as compiler
    pipeline_filename= "VolumeOp_basic.tar.gz"
    compiler.Compiler().compile(VolumeOp_basic,pipeline_filename)

with above snippet, generated pipeline.yaml as below

"apiVersion": |-
  argoproj.io/v1alpha1
"kind": |-
  Workflow
"metadata":
  "annotations":
    "pipelines.kubeflow.org/pipeline_spec": |-
      {"description": "A Basic Example on VolumeOp Usage", "inputs": [{"default": "test-pipeline", "name": "volumename"}], "name": "VolumeOp Basic"}
  "generateName": |-
    volumeop-basic-
"spec":
  "arguments":
    "parameters":
    - "name": |-
        volumename
      "value": |-
        test-pipeline
  "entrypoint": |-
    volumeop-basic

Anything else you would like to add: even if we send namespace info to kfp.Client().create_run_from_pipeline_func(pipeline_func=calc_pipeline, arguments=arguments, namespace="hkumara")

run is failing with below error

---------------------------------------------------------------------------
ApiException                              Traceback (most recent call last)
<ipython-input-5-ba5acd101807> in <module>()
      1 arguments = {'a': '7', 'b': '8'}
----> 2 kfp.Client().create_run_from_pipeline_func(pipeline_func=calc_pipeline, arguments=arguments, namespace="hkumara").output()

/opt/cjup/conda/lib/python3.6/site-packages/kfp/_client.py in create_run_from_pipeline_func(self, pipeline_func, arguments, run_name, experiment_name, pipeline_conf, namespace)
    370       (_, pipeline_package_path) = tempfile.mkstemp(suffix='.zip')
    371       compiler.Compiler().compile(pipeline_func, pipeline_package_path, pipeline_conf=pipeline_conf)
--> 372       return self.create_run_from_pipeline_package(pipeline_package_path, arguments, run_name, experiment_name, namespace)
    373     finally:
    374       os.remove(pipeline_package_path)

/opt/cjup/conda/lib/python3.6/site-packages/kfp/_client.py in create_run_from_pipeline_package(self, pipeline_file, arguments, run_name, experiment_name, namespace)
    411     run_name = run_name or pipeline_name + ' ' + datetime.now().strftime('%Y-%m-%d %H-%M-%S')
    412     experiment = self.create_experiment(name=experiment_name)
--> 413     run_info = self.run_pipeline(experiment.id, run_name, pipeline_file, arguments, namespace=namespace)
    414     return RunPipelineResult(self, run_info)
    415 

/opt/cjup/conda/lib/python3.6/site-packages/kfp/_client.py in run_pipeline(self, experiment_id, job_name, pipeline_package_path, params, pipeline_id, namespace)
    342         pipeline_spec=spec, resource_references=resource_references, name=job_name)
    343 
--> 344     response = self._run_api.create_run(body=run_body)
    345 
    346     if self._is_ipython():

/opt/cjup/conda/lib/python3.6/site-packages/kfp_server_api/api/run_service_api.py in create_run(self, body, **kwargs)
    149             return self.create_run_with_http_info(body, **kwargs)  # noqa: E501
    150         else:
--> 151             (data) = self.create_run_with_http_info(body, **kwargs)  # noqa: E501
    152             return data
    153 

/opt/cjup/conda/lib/python3.6/site-packages/kfp_server_api/api/run_service_api.py in create_run_with_http_info(self, body, **kwargs)
    226             _preload_content=params.get('_preload_content', True),
    227             _request_timeout=params.get('_request_timeout'),
--> 228             collection_formats=collection_formats)
    229 
    230     def delete_run(self, id, **kwargs):  # noqa: E501

/opt/cjup/conda/lib/python3.6/site-packages/kfp_server_api/api_client.py in call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, async_req, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    328                                    response_type, auth_settings,
    329                                    _return_http_data_only, collection_formats,
--> 330                                    _preload_content, _request_timeout)
    331         else:
    332             thread = self.pool.apply_async(self.__call_api, (resource_path,

/opt/cjup/conda/lib/python3.6/site-packages/kfp_server_api/api_client.py in __call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    159             post_params=post_params, body=body,
    160             _preload_content=_preload_content,
--> 161             _request_timeout=_request_timeout)
    162 
    163         self.last_response = response_data

/opt/cjup/conda/lib/python3.6/site-packages/kfp_server_api/api_client.py in request(self, method, url, query_params, headers, post_params, body, _preload_content, _request_timeout)
    371                                          _preload_content=_preload_content,
    372                                          _request_timeout=_request_timeout,
--> 373                                          body=body)
    374         elif method == "PUT":
    375             return self.rest_client.PUT(url,

/opt/cjup/conda/lib/python3.6/site-packages/kfp_server_api/rest.py in POST(self, url, headers, query_params, post_params, body, _preload_content, _request_timeout)
    273                             _preload_content=_preload_content,
    274                             _request_timeout=_request_timeout,
--> 275                             body=body)
    276 
    277     def PUT(self, url, headers=None, query_params=None, post_params=None,

/opt/cjup/conda/lib/python3.6/site-packages/kfp_server_api/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
    226 
    227         if not 200 <= r.status <= 299:
--> 228             raise ApiException(http_resp=r)
    229 
    230         return r

ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Thu, 16 Jan 2020 15:15:08 GMT', 'Content-Length': '140'})
HTTP response body: {"error":"unknown value \"NAMESPACE\" for enum api.ResourceType","message":"unknown value \"NAMESPACE\" for enum api.ResourceType","code":3}
numerology commented 4 years ago

Basically, you pipeline is running under the namespace where KFP is deployed.

How to specify the namespace to which KFP is deployed depends on the deployment option you are using. Take standalone deployment as an example, you can refer to the guidelin in https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize#change-deploy-namespace

numerology commented 4 years ago

/assign @numerology

nrchakradhar commented 4 years ago

In a multi-user and multi-tenant cluster, for resource isolation, when there are multiple namespaces, can the pipeline be not run in different namespaces similar to what is supported by other kubeflow controllers like notebook, tfjob etc.

numerology commented 4 years ago

In a multi-user and multi-tenant cluster, for resource isolation, when there are multiple namespaces, can the pipeline be not run in different namespaces similar to what is supported by other kubeflow controllers like notebook, tfjob etc.

Just to confirm, does it mean install multiple KFP into the same namespace?

nrchakradhar commented 4 years ago

No. Kfp is installed in kubeflow namespace, then the workflow creation shoul happen in other namespaces as requested by users. Notebook controller runs in kubeflow namespace, but the jupyter notebook is spawned in user name spaces.

numerology commented 4 years ago

then the workflow creation shoul happen in other namespaces as requested by users.

I see. The most similar functionality we currently have is the multi-user support, where different users can submit workflows to different namespaces. One nit difference is that, namespaces are protected by authentication, so users cannot submit run to naked namespace without specifying credentials.

@Bobgy @IronPan can better comment on this.

numerology commented 4 years ago

/assign @Bobgy /assign @IronPan

numerology commented 4 years ago

/cc @chensun

Bobgy commented 4 years ago

https://github.com/kubeflow/pipelines/issues/2397 Does this work for your use case? We are in progress.

eterna2 commented 4 years ago

Did u try kfp.Client(namespace=xyz)? This allows u to connect to the specific kfp API server in the namespace u want. And the run will inherit from the API server.

Note that I am running multiple customized kubeflow on diff namespaces on my cluster.

And I am a few version behind. But I was able to run jobs in diff namespaces by pointing to the diff API server in corresponding namespace.

nrchakradhar commented 4 years ago

I guess that would work if we have KFP installed in namespaces. As part of Kubeflow installation, KFP is installed in 'kubeflow' namespace. Currently the kubeflow installation is considered an "uber" installation (one kubeflow per one kubernetes cluster). Hence we have not installed multiple KFPs. If we download the generated workflow and include "namespace" in the metadata, the KFP installed in kubeflow namespace is able to launch workflows in user specified namespaces. This workaround is not a good one from User Experience pov.

nrchakradhar commented 4 years ago

2397

Does this work for your use case? We are in progress.

Does this provide resource isolation? What I mean is that different teams have been allocated quotas in their namespaces and it should not happen that one valid user exhausts all resources.

Bobgy commented 4 years ago

That wasn't in the scope of this design. I'm not familiar with this, but I think kubernetes should be the system to handle resource.

nrchakradhar commented 4 years ago

That wasn't in the scope of this design. I'm not familiar with this, but I think kubernetes should be the system to handle resource.

2397 provides easier viewing in the GUI for users. Our concern is to trigger pipelines(workflow) in user namespaces. This I believe is not addressed by #2397.

Also, as part of Kubeflow installation, only one kfp is installed in kubeflow namespace. User namespaces do not have any kubeflow related controllers deployed.

hemantha-kumara commented 4 years ago

Did u try kfp.Client(namespace=xyz)? This allows u to connect to the specific kfp API server in the namespace u want. And the run will inherit from the API server.

@eterna2 As we are running pipelines on kubeflow cluster where pipeline controller is running only in kubeflow namespace. as no svc & controller running on another namespaces, client is not getting created.

my initial query was, how to specify namespace if only one controller is running in cluster?

Bobgy commented 4 years ago

@nrchakradhar Sorry, I wanted to paste a different link: https://github.com/kubeflow/pipelines/issues/1223.

This issue's solution is running pipelines in user namespaces using a single controller in cluster.

nrchakradhar commented 4 years ago

@Bobgy That issue looks to address multi namespace scenario. The discussion seems to have digressed a little from the original description. If both multi user and multi namespace scenarios are addressed it would definitely be very useful for all enterprise users of kubeflow

Bobgy commented 4 years ago

Good to know. I will keep updating that issue for latest sharings.

Shall we close this issue then?

hemantha-kumara commented 4 years ago

@Bobgy Thanks for pointing out issue which addresses multi-user and multi-namespace scenarios. Yes, we can close this issue.