webmeshproj / webmesh-vdi

A Kubernetes-native Virtual Desktop Infrastructure
GNU General Public License v3.0
440 stars 47 forks source link

Starting a template from UI for a certain user results in nothing #25

Closed hananmoiseyev closed 3 years ago

hananmoiseyev commented 3 years ago

I have a few users, they are all using the same docker image, the same desktop template (with very small diffs) and they all did work.

2 hours ago, one user showed me that starting a session via the UI results in the "Waiting" screen. Looking at the K8S Dash, I can see that a service is there but no POD at all. I can't tell if there are logs anywhere, but its just strange.

Other users can use the same template but it will take a while to start with lots of POD restarts...

Any clue what is going on?

tinyzimmer commented 3 years ago

Oh it's spec.userdataSelector. It isn't underneath userdataSpec, since that is just an alias to a PVCSpec.

tinyzimmer commented 3 years ago

https://github.com/tinyzimmer/kvdi/blob/main/doc/appv1.md#vdiclusterspec - It replaces userdataSpec when you use it.

hananmoiseyev commented 3 years ago

ok, got it.

But it seems like its not able to load now. The PVC is bound to the pod but the pod describe tells:

Warning  FailedMount  51s        kubelet, k133      Unable to attach or mount volumes: unmounted volumes=[home], unattached volumes=[cpbclouds-agplenus run-lock run vnc-sock home cloud-home cloud-tmp tls default-token-pfpdb shm cgroups]: timed out waiting for the condition
tinyzimmer commented 3 years ago
timed out waiting for the condition

is most usually an issue storage-controller side. You'd want to describe the PV for the claim and try to figure out what's wrong with it. What this does look like is that the manager did its part of the bargain correctly.

hananmoiseyev commented 3 years ago

ok, the issue was that I have mounted the same PVC in the template yaml. Works like a charm now!

tinyzimmer commented 3 years ago

Yea that actually veers into another thing that I'm still on the fence on. Templates really should be validated more thoroughly when they are created, because things like this could be caught programmatically when you try to edit/create them in the first place. Then when you get an error telling you exactly what is wrong, it's easier to fix it.

Doing that would introduce other problems though. At the very least I'm exploring having any opt-in option for better template validation.

hananmoiseyev commented 3 years ago

Fair enough

I am now giving longhorn storage another chance as a pv. Let’s see how well it can do.

The last issue I got is that from the web, I cannot connect to a session that was opened (but not used at the moment) from another session.

On 28 Feb 2021, at 13:12, Avi Zimmerman notifications@github.com wrote:

 Yea that actually veers into another thing that I'm still on the fence on. Templates really should be validated more thoroughly when they are created, because things like this could be caught programmatically when you try to edit/create them in the first place. Then when you get an error telling you exactly what is wrong, it's easier to fix it.

Doing that would introduce other problems though. At the very least I'm exploring having any opt-in option for better template validation.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

tinyzimmer commented 3 years ago

That's by design, and like I said up there, I might consider making it an opt-out

Right now locks are taken for every display/audio stream so that there can only be one of each at any time per desktop. I can see maybe looking into making this optional, but it would be a bit involved. Those same locks are used when the API is querying session status (but that could be adapted).

tinyzimmer commented 3 years ago

I'll add that to get around this, just click out of the control tab in the window you aren't using. Then you can connect to it in the other tab.

hananmoiseyev commented 3 years ago

Ok.

Btw, did you consider supporting a sleep/hibernate option? That way, sessions will be able to resume.

On 28 Feb 2021, at 14:01, Avi Zimmerman notifications@github.com wrote:

 I'll add that to get around this, just click out of the control tab in the window you aren't using. Then you can connect to it in the other tab.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

tinyzimmer commented 3 years ago

For containers that idea isn't so straight forward.

For the QEMU stuff it's certainly possible and I could think of some interesting ways to accomplish it or something similar. But that's because you have actual ACPI events and virtual hardware to work with.

In containers, there is no virtual hardware or kernel. It's all the same as the host. They are just a bunch of processes running in their own namespace is all. There is nothing to tell to "sleep" or "hibernate", the processes are either just running or they aren't, and the usual things they listen to for such events aren't there. You could effectively accomplish the same thing just by closing your browser tab or clicking to another screen in kvdi. That disconnects the display/audio (leaving all your windows inside as is) and then the session itself just sits hanging there.

I know it's not the best solution, but I can't think of many other ways to accomplish that behavior off the top of my head.

hananmoiseyev commented 3 years ago

You could effectively accomplish the same thing just by closing your browser tab or clicking to another screen in kvdi. That disconnects the display/audio (leaving all your windows inside as is) and then the session itself just sits hanging there.

I can't seem to make this happen. Anytime I close the tab, its not possible to get back to it.

I was under the impression that the docker checkpoint might do the trick here.

tinyzimmer commented 3 years ago

I was just able to reproduce the bug you have been trying to describe 😆 . The

Cannot read property 'spec' of undefined

This should not be the case but yea, the steps to reproduce were:

This should be working and might be related to a recent UI change. I'll take a stab at fixing it tonight.


I was under the impression that the docker checkpoint might do the trick here.

Yea, but I don't really have access to that. I am talking to docker via Kubernetes. And besides the fact that Kubernetes isn't necessarily being backed by docker all the time, there are no APIs exposed for taking a "checkpoint" of a pod. Pods are generally supposed to be stateless and you can't just "stop" them and turn them back on at that same point.

It's K8s' world, we are just living in it

tinyzimmer commented 3 years ago

I found the new tab issue and I'll publish a new patch release in a bit.

hananmoiseyev commented 3 years ago

Amazing!

What is the best option to maintain sessions then? Keep the browser open and use other tabs? Maybe code a small script that will use the api to grab on that handshake with the vdi cluster?

On 28 Feb 2021, at 18:56, Avi Zimmerman notifications@github.com wrote:

 I found the new tab issue and I'll publish a new patch release in a bit.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

tinyzimmer commented 3 years ago

The intended behavior, and what will be the case in this fix. Is you can close the browser, close the tab, switch panels, whatever. It'll be there when you get back. The UI does terminate your sessions when you click the logout button, but you don't have to do that. I might change this behavior in the future.

hananmoiseyev commented 3 years ago

So the pods will remain there for life? (I tend to like it in my case, but a flag to control that is a great option). If the sessions are staying there (i thought you planned to not have the pods live after the session died), I am good. Waiting for the magic build!

tinyzimmer commented 3 years ago

There is already a flag to control that. See maxSessionLength in the VDICluster config.

tinyzimmer commented 3 years ago

To clarify because I see the description in the documentation is a little lacking...you can set that to something like 8h which will have sessions automatically terminate after 8 hours.

tinyzimmer commented 3 years ago

This will be the "magic" build by the way. Takes 15-20 minutes these days. Then new images will be available for you to do a helm upgrade with.

hananmoiseyev commented 3 years ago

Great. I just hope that the upgrade will work... SOmetimes I must do a fresh install

tinyzimmer commented 3 years ago

That's odd, and if it happens again, I'd be curious the errors that happen. The only time helm should have trouble is if the CRDs are being updated in the release. In which case you need to apply the updated CRDs before running helm upgrade.

EDIT: But that won't be the case here

hananmoiseyev commented 3 years ago

BTW, in your last release, 0.3.1, the URL of one of the CRDs is wrong. And with the new info, I would just run the CRD update every time I run the helm upgrade. Is that a bad idea?

tinyzimmer commented 3 years ago

Good catch, fixed. It's wrong in the README too. You certainly can to be on the safe side, and most of the time it'll be fine. But it depends on what's going on in that release. If the upgrade process will be more involved ever I'll probably call it out though.

hananmoiseyev commented 3 years ago

ok.

About the maxSessionLength flag. Is that absolute? Can I set it to non? or to forever?

tinyzimmer commented 3 years ago

Undefined is forever.

tinyzimmer commented 3 years ago

To clarify, leave it out entirely or set it to an empty string

hananmoiseyev commented 3 years ago

Error: UPGRADE FAILED: cannot patch "kvdi-manager" with kind Deployment: Deployment.apps "kvdi-manager" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"kvdi", "app.kubernetes.io/managed-by":"Helm", "app.kubernetes.io/name":"kvdi", "app.kubernetes.io/version":"v0.3.2", "helm.sh/chart":"kvdi-v0.3.2"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

tinyzimmer commented 3 years ago

Thank you! This is a weird error indeed. I wasn't aware "Deployments" had any immutable flags. I'll have to find a fix for this in the helm chart (which I actually generate from the kustomize - its a story by itself).

tinyzimmer commented 3 years ago

You don't neccesarily need to use helm to upgrade - you just need to patch the kvdi-manager deployment to use the latest image. It'll then replace the app with the image matching its version.

tinyzimmer commented 3 years ago
$ k describe deployment kvdi-manager 
Name:                   kvdi-manager
Namespace:              default
CreationTimestamp:      Sun, 28 Feb 2021 20:00:20 +0200
Labels:                 app.kubernetes.io/instance=kvdi
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=kvdi
                        app.kubernetes.io/version=v0.3.2
                        helm.sh/chart=kvdi-v0.3.2
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: kvdi
                        meta.helm.sh/release-namespace: default
Selector:               app.kubernetes.io/instance=kvdi,app.kubernetes.io/managed-by=Helm,app.kubernetes.io/name=kvdi,app.kubernetes.io/version=v0.3.2,helm.sh/chart=kvdi-v0.3.2
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app.kubernetes.io/instance=kvdi
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=kvdi
                    app.kubernetes.io/version=v0.3.2
                    helm.sh/chart=kvdi-v0.3.2
  Service Account:  kvdi
  Containers:
   kube-rbac-proxy:
    Image:      gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
    Port:       8443/TCP
    Host Port:  0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://127.0.0.1:8080/
      --logtostderr=true
      --v=10
    Environment:  <none>
    Mounts:       <none>
   manager:
    Image:      ghcr.io/tinyzimmer/kvdi:manager-v0.3.2   # <- Point it to this image
   ...
hananmoiseyev commented 3 years ago

It doesn't let me edit it. No idea what's going on here.., I will wait for the helm upgrade to start working or for me to have time to recreate the entire thing (or finish my init scripts)

tinyzimmer commented 3 years ago

A third option.

kubectl delete deployment kvdi-manager

Then try the helm upgrade.

I can try to find a fix for the chart, but I dunno how long it'll take me.

tinyzimmer commented 3 years ago

In case you are worried, the manager is stateless. It's safe to delete that deployment. Just nothing really works while it isn't there.

tinyzimmer commented 3 years ago

I can fix the helm chart going forward, but since your deployment already has the bad fields, you'd have to delete it anyway before you could upgrade to a non-broken chart.

But again, you shouldn't have to delete the entire helm release. Just deleting the Kubernetes deployment should be fine, and you'll keep all your configurations.

hananmoiseyev commented 3 years ago

Managed to edit the deployment eventually. Took it quite a while to load and let the pods load.

Seems like the kvdi is fine now. Now, we start testing.

On 28 Feb 2021, at 20:24, Avi Zimmerman notifications@github.com wrote:

 I can fix the helm chart going forward, but since your deployment already has the bad fields, you'd have to delete it anyway before you could upgrade to a non-broken chart.

But again, you shouldn't have to delete the entire helm release. Just deleting the Kubernetes deployment should be fine, and you'll keep all your configurations.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

hananmoiseyev commented 3 years ago

Here is another riddle:

I plan to have 15 users with 4 templates each. Since I need to set mints for each user differently, I can’t use a generic template so I will be making 60 templates.

Can I at least do something to filter the list of templates in the GUI for non admin users so they will see only their own templates?

On 28 Feb 2021, at 20:24, Avi Zimmerman notifications@github.com wrote:

 I can fix the helm chart going forward, but since your deployment already has the bad fields, you'd have to delete it anyway before you could upgrade to a non-broken chart.

But again, you shouldn't have to delete the entire helm release. Just deleting the Kubernetes deployment should be fine, and you'll keep all your configurations.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

tinyzimmer commented 3 years ago

I do want to find an easier way to make what you are trying possible, but it'll probably take time.

That being said, you could leverage the RBAC system to achieve this, but it would still be a bunch of extra shit to configure on your end. A user only sees in the UI what their permissions allow them to see and/or use. That applies to users and roles, but also templates, namespaces, service accounts etc.

So if you created a unique role for each user that had a rule similar to the rule in the default kvdi-launch-templates role, except you set a resourcePattern on the rule to something like *<username>* -- Then, the user will only see templates in the UI that match that pattern.

Did that make sense?

tinyzimmer commented 3 years ago

FWIW, this would be easier to automate now with the new CLI. It's just piggybacking off the existing functionality of the API/UI, but now gives it to you at the command-line for doing stuff like this easily.

hananmoiseyev commented 3 years ago

The best solution is to support ${USERNAME} in the template. If you are parsing that prior to applying it, that seems possible. Then I will only need to create basic templates.

The other possible option for me is to use the roles. Ugly yet will probably work.

I have not yet tried the cli. Will do soon probably

On 28 Feb 2021, at 21:10, Avi Zimmerman notifications@github.com wrote:

 FWIW, this would be easier to automate now with the new CLI. It's just piggybacking off the existing functionality of the API/UI, but now gives it to you at the command-line for doing stuff like this easily.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

tinyzimmer commented 3 years ago

Haha yay it works...something like this...

for user in {foo,bar} ; do 
    kvdi roles create --name=${user}-role \
        --verbs=read,use,launch --resources=templates --namespaces=default --resource-patterns=".*?${user}.*?"
    kvdi user update ${user} --roles=${user}-role
done

I just made it so I'm still having fun with it 😛


The best solution is to support ${USERNAME} in the template. If you are parsing that prior to applying it, that seems possible. Then I will only need to create basic templates.

It's what I'm leaning towards, but it comes with potholes along the way. A good chunk of the logic converting templates to a pod spec on the fly would need to be refactored a bit.

hananmoiseyev commented 3 years ago

Cool.

I can wait with the a little more. I am still doing tests.

On 28 Feb 2021, at 21:32, Avi Zimmerman notifications@github.com wrote:

 Haha yay it works...something like this...

for user in {foo,bar} ; do kvdi roles create --name=${user}-role \ --verbs=read,use,launch --resources=templates --namespaces=default --resource-patterns=".?${user}.?" kvdi user update ${user} --roles=${user}-role done I just made it so I'm still having fun with it 😛

The best solution is to support ${USERNAME} in the template. If you are parsing that prior to applying it, that seems possible. Then I will only need to create basic templates.

It's what I'm leaning towards, but it comes with potholes along the way. A good chunk of the logic converting templates to a pod spec on the fly would need to be refactored a bit.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

hananmoiseyev commented 3 years ago

Btw, can you suggest a backup method for the configurable things? Users basically

On 28 Feb 2021, at 21:34, Hanan Moiseyev hanan.moiseyev@gmail.com wrote:

 Cool.

I can wait with the a little more. I am still doing tests.

On 28 Feb 2021, at 21:32, Avi Zimmerman notifications@github.com wrote:

 Haha yay it works...something like this...

for user in {foo,bar} ; do kvdi roles create --name=${user}-role \ --verbs=read,use,launch --resources=templates --namespaces=default --resource-patterns=".?${user}.?" kvdi user update ${user} --roles=${user}-role done I just made it so I'm still having fun with it 😛

The best solution is to support ${USERNAME} in the template. If you are parsing that prior to applying it, that seems possible. Then I will only need to create basic templates.

It's what I'm leaning towards, but it comes with potholes along the way. A good chunk of the logic converting templates to a pod spec on the fly would need to be refactored a bit.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

tinyzimmer commented 3 years ago

Everything is configurable 😉. The yaml just hold on to the yaml obviously.

For users, the localAuth is really only intended for development. It's not very robust by any means, nor super secure (as mentioned in the README). Better to at least setup a vault instance and use that if you are concerned about that sort of thing.

That being said, assuming you are using the k8sSecret backend, then the entire "database" containing the info is kept in the Kubernetes secret where the other secrets are (would be in vault instead if you were using the vault backend). Specifically there should be a passwd field that sorta resembles your typical /etc/passwd file.

$ kubectl get secret kvdi-app-secrets -o json | jq .data.passwd -r | base64 -d
admin:kvdi-admin:$2a$04$DD6Tc6I1GkvPIpro3p27wOt403qyhxAUtVGj9kxFL/g6y1zEzUWTG

Obviously never share that, and if you copy it keep it in a secure place. But you could in theory just keep a backup of that secret and restore it onto a fresh installation to get that data back.

hananmoiseyev commented 3 years ago

ok, noted. Can I somehow add DNS info to the template? It seems like the pod got nothing configured there

hananmoiseyev commented 3 years ago

Here is another UX mission: When in full screen, hitting the ESC key will take you out of the full screen. On the other hand, VIM is a very big ESC key consumer.

I know that you utilize the full screen feature from the browser but that should be changed to allow working in full screen without interference

tinyzimmer commented 3 years ago

Can I somehow add DNS info to the template? It seems like the pod got nothing configured there

I can see adding DNS configurations in the future. There definitely is something there though. At the very least the Kube DNS and whatever it is forwarding to on the host.

When in full screen, hitting the ESC key will take you out of the full screen. On the other hand, VIM is a very big ESC key consumer.

Practically speaking, I can't just disable the ESC key behavior. I could intercept it though potentially and leave just F11 for getting out of fullscreen. Will play with it in the future, but I'm going to be too busy today probably.

tinyzimmer commented 3 years ago

I'm going to create two new issues to track these. Let's move the discussion about the USERNAME parameter stuff to #23