kubeflow / manifests

A repository for Kustomize manifests
Apache License 2.0
808 stars 873 forks source link

Using external Auth (APPID) get 403 on notebook and tensorboard creation #1812

Closed moficodes closed 3 years ago

moficodes commented 3 years ago

issue from kubeflow: https://github.com/kubeflow/kubeflow/issues/5803

The APP_SECURE_COOKIES is not set and is set in code to true by default.

This means until we setup https we wont be able to create notebook or tensorboard on non dex auth.

The error looks like

Could not find CSRF cookie XSRF-TOKEN in the request. http://10.14.0.121/jupyter/api/namespaces/kubeflow-user/notebooks

We should make this into a params.env variable and make it configurable.

davidspek commented 3 years ago

@moficodes Yeah I was expecting a lot of people would run into this. For my distribution I fixed this problem by having the kubeflow-self-signing-issuer used by the components create certificates for the istio ingress-gateway as well and defaulting to HTTPS.

@yanniszark Given that many people are going to run into this issue and most likely create a lot of issues, should I open up a PR for defaulting the ingress gateway to HTTPS with a certificate from the kubeflow-self-signing-issuer?

moficodes commented 3 years ago

@DavidSpek thanks.

yanniszark commented 3 years ago

@moficodes @DavidSpek I see two distinct issues here:

Let me answer each one:

  • One for making changes to individual app settings.

@moficodes to make changes to the default Tensorboard Web App (or other web app), you can use standard kustomize patches. Here are some example: https://kubectl.docs.kubernetes.io/references/kustomize/kustomization/patchesstrategicmerge/ https://github.com/kubeflow/manifests/blob/master/apps/profiles/upstream/overlays/kubeflow/patches/kfam.yaml

If you need some help on using kustomize patches, feel free to ping me and we can arrange a quick meet. Please tell me if anything is not clear.

  • One for using a self-signed certificate by default.

For this issue, I am inclined to decide against including such a thing. Exposing Services correctly to the outside world is a deeply platform/environent-specific process and each distribution should include instructions if they want to guide their users through exposing their Kubeflow installations correctly with HTTPS. This is the reason why we have only included the port-forward method in the installation guide. Perhaps we should make the guide more clear and say that exposing a Service to the outside world depends on the environment/platform and is out of scope.

davidspek commented 3 years ago

@yanniszark I think you are making the assumption that everybody is exposing Kubeflow to the outside world. There is like a large group of people that will be used to using the NodePort directly or MetalLB in their on-prem/testing or homelab cluster while trying to setup Kubeflow. All these people will run into this issue, and will create issues because they run into this very problem. Indeed, exposing Kubeflow and getting a certificate is something distributions will need to provide instructions for. But it is also up to the distributions to replace/remove the certificate in the manifests provided in this repository. The manifests in this repository, along with the instructions, are there to help distribution owners and users get setup quickly with a working installation. Not defaulting the gateway to HTTPS is just going to cause more issues like this one and waste time and cause frustration among both users and maintainers.

moficodes commented 3 years ago

@yanniszark I am fine with adding the custom-env and patch merge the env var in the deployment in our distribution.

As long as this is mentioned in some place everyone can see I think its fine. Specially for maintainers of distribution and people installing for themselves from scratch.

davidspek commented 3 years ago

So I guess the question becomes, is it better to have insecure cookies and HTTP by default and maintainers/users will need to dig through the manifests to fix this, or use secure cookies and HTTPS by default and maitainers/users just need to edit the certificate so it suites their needs. I think following industry best practice would be to have secure cookies and HTTPS as a default.

yanniszark commented 3 years ago

So I guess the question becomes, is it better to have insecure cookies and HTTP by default and maintainers/users will need to dig through the manifests to fix this, or use secure cookies and HTTPS by default and maitainers/users just need to edit the certificate so it suites their needs. I think following industry best practice would be to have secure cookies and HTTPS as a default.

I agree that people will try to expose Kubeflow over HTTP with NodePort / LoadBalancer / Ingress. Because of that, we need to communicate to users what will happen in this case. In other words, we should amend the wg-manifests readme to tell users:

Then it is clear for users what they need to do. Since setting up proper HTTPS is inherently a process bound to the specific environment at hand, we should defer that to each distribution. So for setting up HTTPS, we will point the user to look at a distribution's docs, which will focus on different methods of setting up HTTPS (e.g., on GCP, AWS, on-prem, etc.)

davidspek commented 3 years ago

You cannot expose Kubeflow web apps over HTTP using NodePort / LoadBalancer / Ingress, but only with port-forward.

This causes a huge problem for anybody that does not use their local workstation to setup Kubeflow and moving cluster credentials from one machine to another is a problem due to enterprise security policies. Adding to this, NodePort is used for the default configuration of the Istio ingress-gateway. As such, people will expect to be able to access and use Kubeflow using the NodePort, if only for initial testing of the deployment. When looking at it from this perspective (keep in mind people often miss things while reading), the current manifests do not work Out-of-the-Box wasting time and causing user frustration.

Since setting up proper HTTPS is inherently a process bound to the specific environment at hand, we should defer that to each distribution.

I can't argue with this. However, I think you over simplified the actual situation. If I am not mistaken, most cloud bases distributions will probably use cert-manager and have users setup a ClusterIssuer with their respected ACME integrations. If that is indeed the case, or this setup is recommended, the CR for cert-manager to generate the certificate will be used by most distributions/deployments and only need slight modifications (with an overlay or PatchStrategicMerge) to edit the ClusterIssuer and desired domain(s) in the certificate CR.

yanniszark commented 3 years ago

@DavidSpek thank you for your input. Let's move forward by providing clear documentation and then we can re-evaluate after the release, according to the feedback that we get.

davidspek commented 3 years ago

@yanniszark Sounds like a good plan.

davidspek commented 3 years ago

@yanniszark Another occurrence of somebody running into this problem while busy with setting up the OpenShift distribution: https://github.com/kubeflow/kubeflow/issues/5803#issuecomment-819062233