kubeflow / manifests

A repository for Kustomize manifests
Apache License 2.0
772 stars 836 forks source link

authorize requests to ml-pipeline endpoint if contains any trusted principals #2753

Closed kromanow94 closed 4 days ago

kromanow94 commented 1 week ago

Pull Request Template for Kubeflow manifests Issues

Summary

Allow access to ml-pipeline endpoint if the request contains any trusted principal.

Due to changes introduced in https://github.com/kubeflow/manifests/pull/2544, the kubeflow-userid header is always populated with user/machine details. This is in conflict with the AuthorizationPolicy/ml-pipeline which is configured to only allow access from istio-ingressgateway and also block if the kubeflow-userid header is present. In the past it was made like so for security reason to block header injection, which might a form of attacking the KFP API.

With the changes introduced in the beforementioned PR, Istio will now always verify if the request is authenticated through JWT and will always put the email or sub claim in the header, mitigating the attack through header injection and enforcing user authentication.

Because of that, the AuthorizationPolicy/ml-pipeline has to be updated to allow API Access from in-cluster KF Notebook from any user authorized with JWT.

✏️ A brief description of the changes

πŸ› If this PR is related to an issue, please put the link of the issue here.

βœ… Unit Test Checklist

βœ… Contributor checklist

rimolive commented 1 week ago

Looks like it's good to merge, but can you sign-off the commit?

kromanow94 commented 1 week ago

@rimolive I'm adding tests for gh-actions to run a test pipeline from kf notebook. I have things worked out already, just doing some small cosmetical changes. I'll have it ready in a few hours. Then I'll push to this PR and we should be good to merge.

kromanow94 commented 1 week ago

Hey, so the scope of changes and acceptance criteria seems to be complete but it seems we've hit some limitation of the underlying infra...

Warning  WorkflowNodeError      6s    workflow-controller  Error node addition-pipeline-fnrwn.root.add.executor: Error (exit code 1): failed to put file: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.

https://github.com/kubeflow/manifests/actions/runs/9611836057/job/26511255711?pr=2753

I'm not sure how to proceed from this moment. The test right now fails if the pipeline fails but the test itself can as well just focus on if the pipeline run was successfully created from KF Notebook with plain client = kfp.Client() (so no additional credentials or any configuration to run the test).

@rimolive , @juliusvonkohout , any advice? Do you consider we can drop the requirement to fail the gh action if kf pipeline failed?

kromanow94 commented 1 week ago

@rimolive , @juliusvonkohout, I know the time is short so I changed the test for now to accept failure of pipeline run. The main goal of verifying if Pipeline Run can be started from KF Notebook without any additional configuration to the kfp.Client has succeeded.

If you accept it in this form, feel free to merge. If not, let me know so I can revert the last commit. Also, I added @juliusvonkohout to my repository because in just a few hours I'm going for a little weekend retreat and I'm not going to be available. If time is critical, feel free to make changes on the branch/PR directly.

kromanow94 commented 1 week ago

Oh, but now the test has succeeded and the pipeline run didn't fail... Ok, if this fails only sometimes, I prefer to have the test failed if the pipeline failed. We can always retry.

2024-06-21 11:21:35,027 - run_and_wait_for_pipeline - INFO - Pipeline Run finished in state: SUCCEEDED.
2024-06-21 11:21:35,027 - run_and_wait_for_pipeline - INFO - Pipeline Run finished with error: None.

https://github.com/kubeflow/manifests/actions/runs/9612748813/job/26514028934?pr=2753

kromanow94 commented 1 week ago

Now it randomly failed again.

System.IO.IOException: No space left on device : '/home/runner/runners/2.317.0/_diag/Worker_20240621-112805-utc.log'
You are running out of disk space. The runner will stop working when the machine runs out of disk space. Free space left: 7 MB

https://github.com/kubeflow/manifests/actions/runs/9612888226?pr=2753

Can somebody retry, please?

juliusvonkohout commented 1 week ago

Thank you very much. At least we have a test now, which is better than the previous state.

juliusvonkohout commented 4 days ago

@kimwnasptd any objections?

juliusvonkohout commented 4 days ago

@rimolive i will review this PR until Monday July 1 first.

juliusvonkohout commented 4 days ago

@kromanov we also need negative test to check that it does not work without the proper token, but just the faked userid header. Lets do that in a follow up PR.

/lgtm /approve

since we have to get RC.2 ready and Kimonas approved it in https://github.com/kubeflow/manifests/issues/2747#issuecomment-2151541954. We have to do a full assessment next week.

google-oss-prow[bot] commented 4 days ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: juliusvonkohout

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/manifests/blob/master/OWNERS)~~ [juliusvonkohout] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment