blublinsky commented 4 years ago

/kind bug

When trying to change namespace on Kubeflow main page, the drop down of namespaces is showing behind the left blue pane, which completely obstructs the list

Kubeflow version: 0.6
kfctl version: kfctl v0.6.0-0-g71aea0a9
Kubernetes platform: OpenShift
Kubernetes version: kubernetes v1.13.4+c62ce01

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the label kind/bug to this issue, with a confidence of 0.95. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

bbhuston commented 4 years ago

Experiencing same issue when installing Kubeflow into an existing GKE cluster

Kubeflow version: 0.6.1-rc.2 kfctl version: kfctl v0.6.1-rc.2-1-g3a37cbc6 Kubernetes platform: GCP/GKE (install into existing cluster) Kubernetes version: kubernetes v1.13.4

bbhuston commented 4 years ago

Experiencing same issue when installing Kubeflow into an existing GKE cluster

Kubeflow version: 0.6.1-rc.2 kfctl version: kfctl v0.6.1-rc.2-1-g3a37cbc6 Kubernetes platform: GCP/GKE (install into existing cluster) Kubernetes version: kubernetes v1.13.4

Just tried installing with the same setup as above but using version 0.6.2-rc.2. Same issue with namespace selection

shawnho1018 commented 4 years ago

I encountered similar issue. However, I could not judge if the drop down of namespaces is showing behind the left blue pane. For me, the namespace selection is just not working. When I click "Select namespace", there is a only a button animation (blue color appeared on white button) but no drop-down selection at all.

my kfctl version: v0.6.1-rc.2-1-g3a37cbc6 kubernetes platform: Vanilla kubernetes installed from kubeadm Kubernetes version: kubernetes v.1.15.1

jlewi commented 4 years ago

/assign @avdaredevil

@avdaredevil Is this the same issue as #3698?

avdaredevil commented 4 years ago

Yes, it's the same and has been resolved with the latest release.

@shawnho1018 seems to have a different issue, and that seems to be based on no namespace list being returned to the dashboard (might be a networking / permissioning issue)

shawnho1018 commented 4 years ago

Yes, it's the same and has been resolved with the latest release.

@shawnho1018 seems to have a different issue, and that seems to be based on no namespace list being returned to the dashboard (might be a networking / permissioning issue)

I also suspected it was the serviceaccount issue but I don't see any error log in kubectl logs -n kubeflow -f centraldashboard-79df4778f7-glhfj

kubeflow-centraldashboard@0.0.2 start /app
 npm run serve

kubeflow-centraldashboard@0.0.2 serve /app
node dist/server.js

Initializing Kubernetes configuration
"other" is not a supported platform for Metrics
Using Profiles service at http://10.101.82.170:8081/kfam
Server listening on port http://localhost:8082

As for networking, all master/worker nodes are routable to each other and I use Calico as my CNI. I logged onto centraldashboard pod and try to access kubernetes api-server to retrieve namespaces using the token in /var/run/secrets/kubernetes.io/serviceaccount/token. The result is positive.

wget -O- https://10.66.202.112:6443/api/v1/namespaces --no-check-certificate --header "Authorization: Bearer $token"
Connecting to 10.66.202.112:6443 (10.66.202.112:6443)
{
  "kind": "NamespaceList",
  "apiVersion": "v1",
  "metadata": {
    "selfLink": "/api/v1/namespaces",
    "resourceVersion": "230482"
  },
  "items": [
    {
      "metadata": {
        "name": "default",
        "selfLink": "/api/v1/namespaces/default",
        "uid": "4b35a170-1a50-4e0b-9c38-b7125ac5c3fa",
        "resourceVersion": "148",
        "creationTimestamp": "2019-08-10T03:25:01Z"
      },
      "spec": {
        "finalizers": [
          "kubernetes"
        ]
      },
      "status": {
        "phase": "Active"
      }
    },
    {
      "metadata": {
        "name": "istio-system",
        "selfLink": "/api/v1/namespaces/istio-system",
        "uid": "1ce730fa-8fb7-4afa-9ac7-205243389d98",
        "resourceVersion": "218634",
        "creationTimestamp": "2019-08-11T12:22:15Z",
        "labels": {
          "istio-injection": "disabled"
        }
      },
      "spec": {
        "finalizers": [
          "kubernetes"
        ]
      },
      "status": {
        "phase": "Active"
      }
    },
    {
      "metadata": {
        "name": "kube-node-lease",
        "selfLink": "/api/v1/namespaces/kube-node-lease",
        "uid": "cee08b26-73bf-4c0e-a84c-f25a6703eae8",
        "resourceVersion": "6",
        "creationTimestamp": "2019-08-10T03:24:58Z"
      },
      "spec": {
        "finalizers": [
          "kubernetes"
        ]
      },
      "status": {
        "phase": "Active"
      }
    },
    {
      "metadata": {
        "name": "kube-public",
        "selfLink": "/api/v1/namespaces/kube-public",
        "uid": "578fc391-0fce-4058-b1f8-ed150366143e",
        "resourceVersion": "5",
        "creationTimestamp": "2019-08-10T03:24:58Z"
      },
      "spec": {
        "finalizers": [
          "kubernetes"
        ]
      },
      "status": {
        "phase": "Active"
      }
    },
    {
      "metadata": {
        "name": "kube-system",
        "selfLink": "/api/v1/namespaces/kube-system",
        "uid": "2fcd4d7b-ea50-465c-b86b-ee9073d83ee1",
        "resourceVersion": "4",
        "creationTimestamp": "2019-08-10T03:24:58Z"
      },
      "spec": {
        "finalizers": [
          "kubernetes"
        ]
      },
      "status": {
        "phase": "Active"
      }
    },
    {
      "metadata": {
        "name": "kubeflow",
        "selfLink": "/api/v1/namespaces/kubeflow",
        "uid": "3f5af332-84a5-46d5-a78f-04e228489e36",
        "resourceVersion": "130865",
        "creationTimestamp": "2019-08-10T23:51:32Z"
      },
      "spec": {
        "finalizers": [
          "kubernetes"
        ]
      },
      "status": {
        "phase": "Active"
      }
    },
    {
      "metadata": {
        "name": "local-path-storage",
        "selfLink": "/api/v1/namespaces/local-path-storage",
        "uid": "33877263-24b7-49f5-815b-082c80fd0864",
        "resourceVersion": "117603",
        "creationTimestamp": "2019-08-10T22:41:57Z"
      },
      "spec": {
        "finalizers": [
          "kubernetes"
        ]
      },
      "status": {
        "phase": "Active"
      }
    }
  ]

Still puzzled on the issue. Any recommendation on which pod might worth to further investigate?

bbhuston commented 4 years ago

@jlewi @avdaredevil Thanks for looking into this issue. In contrast to what I previously posted, I am actually hitting the exact same issue as @shawnho1018. (I originally has just assumed the namespace drop-down was hidden somewhere, but in fact it's not even selectable.)

Having said this, I do not believe the issue is caused by a "roll-your-own-kubernetes" networking setup that @shawnho1018 has with his kubeadm installation (because I'm hitting this on a plain-vanilla GKE cluster), nor do I believe that it was fixed in the latest release -- I am hitting this same issue even after doing a bleeding edge install of the kfctl manfests (i.e., by setting --version="v0.6.2-rc.0" at the kfctl init step).

The steps to reproduce my exact issue on GKE are listed below

# Generate legacy default auth credential file for use with terraform 
gcloud auth application-default login

# Download latest terraform client, if not already present
brew install terraform

# Create terraform file that uses  [official GCP GKE module](https://registry.terraform.io/modules/terraform-google-modules/kubernetes-engine/google/4.1.0/submodules/beta-private-cluster)
cat << EOF > main.tf
module "gke" {
  source                     = "terraform-google-modules/kubernetes-engine/google//modules/beta-public-cluster"
  version                    = "4.1.0"
  kubernetes_version         = "1.13.7-gke.8"
  project_id                 = "my-gcp-project"
  name                       = "my-test-cluster"
  regional                   = true
  region                     = "us-east1"
  zones                      = ["us-east1-b", "us-east1-c", "us-east1-d"]
  network                    = "default"
  subnetwork                 = "default"
  ip_range_pods              = ""
  ip_range_services          = ""
  service_account            = "my-service-acount"
  http_load_balancing        = true
  horizontal_pod_autoscaling = true
  kubernetes_dashboard       = false
  network_policy             = false
  istio                      = false
  cloudrun                   = false

  node_pools = [
    {
      name               = "default-node-pool"
      machine_type       = "n1-standard-2"
      min_count          = 1
      max_count          = 100
      disk_size_gb       = 100
      disk_type          = "pd-standard"
      image_type         = "COS"
      auto_repair        = true
      auto_upgrade       = false
      service_account    = "my-service-account"
      preemptible        = false
      initial_node_count = 1
    },
  ]

  node_pools_oauth_scopes = {
    all = []
    default-node-pool = [
      "https://www.googleapis.com/auth/cloud-platform",
    ]
  }

  node_pools_labels = {
    all = {}
    default-node-pool = {
      default-node-pool = true
    }
  }

  node_pools_metadata = {
    all = {}
    default-node-pool = {
      node-pool-metadata-custom-value = "my-node-pool"
    }
  }

  node_pools_taints = {
    all = []
    default-node-pool = [
      {
        key    = "default-node-pool"
        value  = true
        effect = "PREFER_NO_SCHEDULE"
      },
    ]
  }

  node_pools_tags = {
    all = []
    default-node-pool = [
      "default-node-pool",
    ]
  }
}
EOF

# Create GKE cluster via standard terraform client commands
terraform init
terraform plan
terraform apply

### Install kubeflow into existing cluster as per official guidance ###

# download kfctl
wget -O kfctl_v0.6.1_darwin.tar.gz https://github.com/kubeflow/kubeflow/releases/download/v0.6.1/kfctl_v0.6.1_darwin.tar.gz
tar -xvf kfctl_v0.6.1_darwin.tar.gz

# create generic k8s kubeflow app
CONFIG="https://raw.githubusercontent.com/kubeflow/kubeflow/master/bootstrap/config/kfctl_k8s_istio.yaml"
KFAPP=my-test-app
./kfctl init ${KFAPP} --config=${CONFIG} --version="v0.6.2-rc.0" -V

# generate and deploy generic k8s kubeflow manifests
cd ${KFAPP} 
./../kfctl generate all -V
./../kfctl apply all -V

# hot-fix the missing service account and cluster rolebinding issues, as per related issue tickets
kubectl create sa default-editor -n kubeflow
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user default-editor

# swap out NodePort istio-ingressgateway service for LoadBalancer one (I'm unable to port-forward correctly with the NodePort service for some reason)
kubectl -n istio-system delete svc istio-ingressgateway 

cat << EOF >  istio-ingressgateway-lb.yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    beta.cloud.google.com/backend-config: '{"ports": {"http2":"iap-backendconfig"}}'
  labels:
    app: istio-ingressgateway
    chart: gateways
    heritage: Tiller
    istio: ingressgateway
    release: istio
  name: istio-ingressgateway
  namespace: istio-system
spec:
  externalTrafficPolicy: Cluster
  ports:
  - name: status-port
    # nodePort: 30513
    port: 15020
    protocol: TCP
    targetPort: 15020
  - name: http2
    # nodePort: 31380
    port: 80
    protocol: TCP
    targetPort: 80
  - name: https
    # nodePort: 31390
    port: 443
    protocol: TCP
    targetPort: 443
  - name: tcp
    # nodePort: 31400
    port: 31400
    protocol: TCP
    targetPort: 31400
  - name: https-kiali
    # nodePort: 30453
    port: 15029
    protocol: TCP
    targetPort: 15029
  - name: https-prometheus
    # nodePort: 30584
    port: 15030
    protocol: TCP
    targetPort: 15030
  - name: https-grafana
    # nodePort: 31827
    port: 15031
    protocol: TCP
    targetPort: 15031
  - name: https-tracing
    # nodePort: 32653
    port: 15032
    protocol: TCP
    targetPort: 15032
  - name: tls
    # nodePort: 31118
    port: 15443
    protocol: TCP
    targetPort: 15443
  selector:
    app: istio-ingressgateway
    istio: ingressgateway
    release: istio
  sessionAffinity: None
  type: LoadBalancer
EOF

kubectl apply -f istio-ingressgateway-lb.yaml

# port-forward istio-ingressgateway and navigate to central dashboard
kubectl -n istio-system port-forward svc/istio-ingressgateway 8080:80 &
echo 'navigating to "http://localhost:8080/" takes one to the central dashboard and namespaces are unselectable'

# nuke cluster
terraform destroy
rm -rf .terraform/ && rm terraform.*
echo "Peace!  I'm out"

deepdad commented 4 years ago

I don't see the dropdown. It is just stuck. The browser network does show blocking google fonts.

Using minikube on a Mac. Is there a workaround to create a notebook server + namespace?

shawnho1018 commented 4 years ago

I don't see the dropdown. It is just stuck.

Using minikube on a Mac. Is there a workaround to create a notebook server + namespace? @deepdad where do you see the error log? Could you point to me? My symptom looks the same like yours. I am waiting for @avdaredevil and @jlewi but it would be nice to diagnose if my error is the same as yours.

deepdad commented 4 years ago

That's in the browser @shawnho1018 developer tools It may also be that I don't share the /Users folder with Virtualbox. That has too much access rights. For the rest here's a history, it has some tiny differences with the docs (no ambassador, forwarding is different and this combines tow pages of the docs but I didn't change any of the files while it is sometimes the case that the minikube ip is hard coded in K8 files):

cd kubeflow/

deepdad commented 4 years ago

This bug is linked to this code:

window.customElements.define("namespace-selector", class  extends a.a{
                static get template()
                {
                    return Object(a.b)(C())
                }
                static get properties()
                {
                    return {
                        queryParams: Object,
                        namespaces: Array,
                        selected: {
                            type: String,
                            observer: "_onSelected",
                            value: "",
                            notify: !0
                        },
                        allNamespaces: {
                            type: Boolean,
                            value: !1
                        },
                        selectedNamespaceIsOwned: {
                            type: Boolean,
                            readOnly: !0,
                            notify: !0,
                            value: !1
                        }
                    }
                }
                static get observers()
                {
                    return ["_queryParamChanged(queryParams.ns)", "_ownedContextChanged(namespaces, selected)"]
                }
                isOwner(e)
                {
                    return "owner" == e
                }
                getNamespaceText(e, n)
                {
                    return n ? "All Namespaces" : e || "Select namespace"
                }
                _queryParamChanged(e)
                {
                    e && this.selected !== e && (this.selected = e)
                }
                _ownedContextChanged(e, n)
                {
                    const t = (e || []).find(e => e.namespace == n) || this.selectedNamespaceIsOwned;
                    this._setSelectedNamespaceIsOwned(this.isOwner(t.role))
                }
                _onSelected(e)
                {
                    this.set("queryParams.ns", e)
                }
            }
            );

It doesn't say much I guess.

avdaredevil commented 4 years ago

Hey all, could you try checking the following:

In the Networks tab in chrome debugger, can you send the responses / statuses recieved for /api/has-workgroup and /api/env-info

In your console run:

JSON.stringify(document.querySelector("main-page").shadowRoot.querySelector("namespace-selector").namespaces)

shawnho1018 commented 4 years ago

@avdaredevil

For the first checking: I could see returned value from /api/env-info but not /api/has-workgroup. The results are shown below for your reference.
Foe the console running the script you provided: Empty response.

avdaredevil commented 4 years ago

Okay this helps, so the dashboard server is unable to see any namespaces for your user. If you hover your mouse over your version (as shown below):

Can you send a snapshot of what that looks like?

Also, try running /api/workgroup/exists instead of the /api/has-workgroup

shawnho1018 commented 4 years ago

@avdaredevil

Mine is v0.0.2-655831
Not sure the meaning of the returned value but it does return.

avdaredevil commented 4 years ago

Awesome! Thanks for the quick replies @shawnho1018.

For context, the /api/workgroup/exists route allows the dashboard to quickly realize if you have IAP / OAuth headers, and if so do you have a workgroup (synonymous with a namespace for now) deployed.

Essentially from this, the dashboard concludes that the IsolationMode (for namespaces) is based on:

Single-User: Every user is a cluster-admin, and everyone sees the same list of namespaces
Multi-User: Each user is scoped to the namespaces they have access to (as owner or contributor), and if you don't own a namespace, initiate the registration-view.

/cc @lluunn @abhi-g

shawnho1018 commented 4 years ago

In that case, since I am the single user, shouldn't I simply see the all namespaces in the cluster? Still scratch my head without knowning how I could resolve this issue. Need a bit more guidance from you @avdaredevil

avdaredevil commented 4 years ago

Okay, so it seems you have no WorkgroupBindings in your network. Essentially the kfam-controller is giving the dashboard a list of 0 bindings. This is unusual to me as some system namespaces should have been created thus some bindings would exist for those. This part gets a little past my understanding of the backend.

I'd suggest pinging @lluunn and @kunmingg on Slack regarding a deeper understanding as to why kfam has no bindings.

Hope that helps?

yanniszark commented 4 years ago

I also confirm this bug.

My understanding is that this is a blocker for v0.6.2. The Dashboard lists user bindings for the namespace and gets nothing back, which makes the Central Dashboard not work without the GCP IAP Headers.

@avdaredevil can you describe in more detail exactly how the Namespace Selector is populated? What are you expecting to find? What is a WorkgroupBinding and why was it introduced?

/priority p0 cc @jlewi

kunmingg commented 4 years ago

Is this use non-GCP + new-central-dashboard? We can create a default public profile for this case to save user effort.

blublinsky commented 4 years ago

In my case, the list of namespaces is coming back. It is just unusable, being obstructed by left hand panel Boris Lublinsky FDP Architect boris.lublinsky@lightbend.com https://www.lightbend.com/

On Aug 12, 2019, at 6:52 PM, Kunming Qu notifications@github.com wrote:

Is this use non-GCP + new-central-dashboard? We can create a default public profile for this case to save user effort.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kubeflow/kubeflow/issues/3859?email_source=notifications&email_token=AGA5MG4YKFW6NZJZSIYGY7LQEHZUDA5CNFSM4IKW5WZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4EEP5A#issuecomment-520636404, or mute the thread https://github.com/notifications/unsubscribe-auth/AGA5MGYPVOMKVD3YOL6L4PLQEHZUDANCNFSM4IKW5WZA.

yanniszark commented 4 years ago

Is this use non-GCP + new-central-dashboard?

Yes, it says platform Openshift.

We can create a default public profile for this case to save user effort.

From what I know, the default profile creation currently only happens for GCP.

I believe it's important to understand what @avdaredevil was expecting to find but didn't. In addition, making a Profile doesn't fix it for me.

jlewi commented 4 years ago

@blublinsky I think the issue with the namespace selector being obstructed is a different issue; #3698.

@avdaredevil @kunmingg Why isn't the WorkGroupBindings populated? is it because of

The request has no ID headers; identifying the user?
No profiles were created?
Something else?

Can you provide some troubleshooting instructions to diagnose where the problem is ?

avdaredevil commented 4 years ago

@yanniszark Essentially workgroups are 1:1 constructs to a namespace. A workgroup holds permissioning data for a namespace. Now there are two cases:

IAP / Other Header based auth (non-GCP)

Scan the UserID headers
Ascertain User-identity
Dashboard changes view to multi-user isolation mode
- Scope namespaces based on bindings (workgroup bindings) that that user has access to.

Basic Auth / Unsecured Network

Query all bindings (every single one) from kfam (this was requested by @kunmingg and @abhi-g over using Kubernetes controller to get the list of all namespaces)
Loop over all bindings, de-duplicate the list by name.
ORM the data for visual purposes only:
- Change role in each binding to contributor (edit access)
- Change user in each binding to anonymous@kubeflow.org

Notes:

The dashboard relies on the /api/env-info call to include a field called namespaces of type SimpleBinding[] to populate the namespace selector.

Debugging thoughts

@jlewi Here are my thoughts on this:

@avdaredevil @kunmingg Why isn't the WorkGroupBindings populated? is it because of The request has no ID headers; identifying the user? No profiles were created? Something else?

There are basically no bindings at all, like not just current user (number of bindings in the entirety of the network = 0)

yanniszark commented 4 years ago

I found a way to fix this. It seems that by default, the CentralDashboard assumes the Profile KFAM Service is at localhost:8084

https://github.com/kubeflow/kubeflow/blob/7c0932a117f2f2dc232271992d0348c4747021eb/components/centraldashboard/app/server.ts#L21-L22

Changing this to profiles-kfam and 8081 seems to fix the problem, as the selector is populated. @avdaredevil why is the default address localhost:8084? Can we make it profiles-kfam:8081 instead?

fediazgon commented 4 years ago

@yanniszark Is there a way to patch a running deployment?

avdaredevil commented 4 years ago

You can change the environment variables passed to CentralDashboard so that it uses the new values

This can be done by editing the YAML in the cloud console

Edit

These env variables should have been set correctly via Kubernetes. @yanniszark

The easiest way to verify is to open an interactive shell to the Docker container running inside a Kubeflow deployment and run env | sort (and if you don't mind paste the output here)

prodonjs commented 4 years ago

The issue results from the fact that during the initial deployment, the centraldashboard Deployment is created before the profiles-service. Thus, the PROFILES_KFAM_SERVICE_HOST and PROFILES_KFAM_SERVICE_PORT are not set by Kubernetes so the dashboard falls back to the default value of localhost.

The easiest way to address the issue in a running deployment is to manually delete the centraldashboard pod and allow the Deployment to recreate it. The new Pod will be created with the appropriate environment variables set and will use it to establish the connection to the Profiles backend.

shawnho1018 commented 4 years ago

The issue results from the fact that during the initial deployment, the centraldashboard Deployment is created before the profiles-service. Thus, the PROFILES_KFAM_SERVICE_HOST and PROFILES_KFAM_SERVICE_PORT are not set by Kubernetes so the dashboard falls back to the default value of localhost.

The easiest way to address the issue in a running deployment is to manually delete the centraldashboard pod and allow the Deployment to recreate it. The new Pod will be created with the appropriate environment variables set and will use it to establish the connection to the Profiles backend.

I followed your step to delete the centraldashboard pod but the problem remains.

prodonjs commented 4 years ago

Can you share the image URI from your centraldashboard deployment as well as output from kubectl logs deployment/centraldashboard?

shawnho1018 commented 4 years ago

Can you share the image URI from your centraldashboard deployment as well as output from kubectl logs deployment/centraldashboard?

this is my log after a newly initiated pod: root@ansible:~/kubeadm-ansible# kubectl logs -f centraldashboard-79df4778f7-rnlpm -n kubeflow kubeflow-centraldashboard@0.0.2 start /app npm run serve

kubeflow-centraldashboard@0.0.2 serve /app node dist/server.js

Initializing Kubernetes configuration "other" is not a supported platform for Metrics Using Profiles service at http://10.101.82.170:8081/kfam Server listening on port http://localhost:8082

prodonjs commented 4 years ago

Interesting. The Pod definitely picked up the environment variables and should be able to establish communication with the profiles API server. That doesn't seem to be the problem here, but it is a general issue we need to address.

I'm out of my depth at this point. Passing it back to @avdaredevil to dig in a little further.

avdaredevil commented 4 years ago

@shawnho1018 Your problem arises on a different step I believe. Your network may not have any bindings at all. The recommended way to create this workgroup/profile is to do this via kfctl for now (in non-identity networks, such as yours).

Easier, but may not work approach: We could fool the dashboard into thinking you are in multi-user and trigger a registration flow by running (in your Dev Console on Central-Dashboard):

document.querySelector("main-page")._setRegistrationFlow(true)

jlewi commented 4 years ago

@prodonjs Can you open an issue about the environment variables. If we're dependent on the order of deployment that seems like a bug. The central dashboard should be community with the KFAM backend via the DNS name established by the virutal service and service defined for the backend. Those should be deterministic based on the kustomize packages.

jlewi commented 4 years ago

@avdaredevil @kunmingg

Can we reproduce this issue; IIUC we should be able to reproduce it by installing on an existing GKE cluster.

Are these changes related to the addition of the kfam API?

It looks like people are hitting this issue now with 0.6.1 which should not be picking up the KFAM API changes.

My guess is that the configs somewhere are not sufficiently pinned so people using releases prior to 0.6.2 are picking up some cherry picks.

if so; do we need to roll back some changes until we've tested further?

jlewi commented 4 years ago

Here's my summary of the issue so far

The original issue on the thread is the namespace UI selection being half hidden which is #3698
- This should potentially be fixed by the latest central dashboard image
A secondary issue that people are reporting is that they are not seeing any namespaces
- This appears to be due to the fact that they are picking up the latest central dashboard image which is now talking to the KFam API to identify which namespaces to display
- It seems like there are two issues here
  
  i) central dashboard may not be configured with the proper IP address of the KFam API ii) They may not be creating any profiles so not have any namespaces which they can access
Users of v0.6.1 are picking up the central dashboard changes because our configs are not pinned to specific commits by
- kfctl_gcp_iap.yaml on v0.6 branch is pulling in master of kubeflow/manifests https://github.com/kubeflow/kubeflow/blob/55d9d6dc1f8d5274e0d35e25655fc634637958e2/bootstrap/config/kfctl_gcp_iap.yaml#L298
- kfctl_k8s_istio.yaml is pulling the latest of kubeflow/manifests/v0.6-branch https://github.com/kubeflow/kubeflow/blob/55d9d6dc1f8d5274e0d35e25655fc634637958e2/bootstrap/config/kfctl_k8s_istio.yaml#L148
- It looks like the v0.6 branch and also master were recently updated to pull in the new central dashboard image (see kubeflow/manifests#265)

Suggested Mitigation

The new central dasbhoard UI functionality needs more testing and probably documentation before we give it to users
Users not using an RC should not be picking up the new dashboard
We should pin kubeflowmanifests on v0.6-branch to kubeflow/manifests commit before the update to central dashboard image

What do folks think?

@yanniszark I think you were working on a PR to pin the kubeflow/manifests used on the v0.6-branch?

yanniszark commented 4 years ago

@jlewi the PR got merged and fixed it. It seems that it was broken by this PR: https://github.com/kubeflow/kubeflow/pull/3892 cc @kunmingg

jlewi commented 4 years ago

@yanniszark Which PR was the fix? And how did #3892 fix it?

yanniszark commented 4 years ago

Sorry, I forgot to mention.

3867 fixed the issue and pinned all configs to v0.6-branch, but #3892 pinned some configs to master again.

However, getting https://github.com/kubeflow/kubeflow/pull/3894 merged should again pin everything to a specific tag.

jlewi commented 4 years ago

@shawnho1018 configs should now pin to an earlier version of kubeflow/manifests which should avoid picking up the changes to the central dashboard

I think that means you won't get the fix for the namespace selector being obscured fix.

3893 is tracking verifying all the auth scenarios work with 0.6.2

.

deepdad commented 4 years ago

-

jlewi commented 4 years ago

@deepdad I think you deleted/or hid you comment

I don't really follow the pinning. Repeating the commands that previously worked no longer do. The output looks different. Ending with couldn't create KfApp: (kubeflow.error): Code 400 with message: invalid name due to a DNS-1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'm .

What did you give as the name for your KFAapp? I think the problem is your name isn't valid so you probably just need to fix the name.

beatgeek commented 4 years ago

Seeing this same issue on build version v0.6.2-rc.0 with a deployment to existing cluster. (BTW the docs reference a port-forward using the ambassador service but there is no such service it's istio-ingress).

GPC GKE - install on existing cluster

bbhuston commented 4 years ago

Seeing this same issue on build version v0.6.2-rc.0 with a deployment to existing cluster. (BTW the docs reference a port-forward using the ambassador service but there is no such service it's istio-ingress).

GPC GKE - install on existing cluster

Just as a friendly update to a previous post of mine, I should mention that I'm still experiencing the same issue as @beatgeek. I tried installing v0.6.2-rc.0 into an existing GKE cluster once again but with no luck. The exact code I used to hit this error is listed here -- https://github.com/kubeflow/kubeflow/issues/3859#issuecomment-520231946

avdaredevil commented 4 years ago

Central Dashboard was updated to surface any errors: https://github.com/kubeflow/kubeflow/pull/3933 [image-pr].

sylus commented 4 years ago

This is also still happening to me launching latest central dashboard and 0.6.2-rc.1. Is anyone actually able to use a release recently?

sylus commented 4 years ago

Tried again against 0.6.2-rc.2 and latest centralsdashboard but namespace is still empty.

Is there anyway to make this work or have to wait for full 0.6.2 release?

Capture

jlewi commented 4 years ago

@sylus You should use v0.6.1; v0.6.1 should be pinned to the v0.6.1 release of the central dashboar.

If you are using v0.6.1 and not running on GCP you will need to create a profile via kubectl to create a namespace. https://www.kubeflow.org/docs/other-guides/multi-user-overview/#step-2-creating-a-user-profile

sylus commented 4 years ago

Unfortunately 0.6.1 has a bug in JupyterHub where if you use a custom namespace it won't appear in the dropdown that is fixed in a later release. Thanks for quick response :D

kubeflow / kubeflow

Kubeflow main page namespace selection #3859

IAP / Other Header based auth (non-GCP)

Basic Auth / Unsecured Network

Notes:

Debugging thoughts

Edit

3867 fixed the issue and pinned all configs to v0.6-branch, but #3892 pinned some configs to master again.

3893 is tracking verifying all the auth scenarios work with 0.6.2