kasmtech / workspaces-issues

20 stars 5 forks source link

[Bug] - Containers such as Fedora 40 not starting after upgrade to workspaces 1.16 #627

Closed luc-vocab closed 3 days ago

luc-vocab commented 1 month ago

Existing Resources

Describe the bug After upgrading to Kasm Workspaces 1.16, some containers such as Fedora 40 are not starting up. Here are logs from the docker daemon:

ep 30 19:21:58 workspaces dockerd[43508]: time="2024-09-30T19:21:58.228816615+08:00" level=info msg="ignoring event" container=883200152b997a3fe0c8a591acdc42c095991aef37361648917e284dae404261 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 30 19:21:58 workspaces dockerd[43508]: time="2024-09-30T19:21:58.326355900+08:00" level=error msg="Error setting up exec command in container 883200152b997a3fe0c8a591acdc42c095991aef37361648917e284dae404261: Container 883200152b997a3fe0c8a591acdc42c095991aef37361648917e284dae404261 is restarting, wait until the container is running" spanID=c8e720c5e0edaecc traceID=a2a91cc25b7c31e079afa7b8668cacf6
Sep 30 19:21:58 workspaces dockerd[43508]: time="2024-09-30T19:21:58.328944607+08:00" level=error msg="Error setting up exec command in container 883200152b997a3fe0c8a591acdc42c095991aef37361648917e284dae404261: Container 883200152b997a3fe0c8a591acdc42c095991aef37361648917e284dae404261 is restarting, wait until the container is running" spanID=06f31766f72d9402 traceID=465405bce2618a4142f0eefd1ff51666
Sep 30 19:21:58 workspaces dockerd[43508]: time="2024-09-30T19:21:58.478565080+08:00" level=info msg="ignoring event" container=883200152b997a3fe0c8a591acdc42c095991aef37361648917e284dae404261 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 30 19:21:58 workspaces dockerd[43508]: time="2024-09-30T19:21:58.486317974+08:00" level=warning msg="ShouldRestart failed, container will not be restarted" container=883200152b997a3fe0c8a591acdc42c095991aef37361648917e284dae404261 daemonShuttingDown=false error="restart canceled" execDuration=123.597838ms exitStatus="{137 2024-09-30 11:21:58.472815308 +0000 UTC}" hasBeenManuallyStopped=true restartCount=1

Seeing the following error in the Kasm Workspaces logging page in the web interface:

host: workspaces
ingest_date: 202409301121
application: kasm_api
levelname: ERROR
kasm_user_name: luc
process: client_api_server
client_ip: [2001:470:f82b:a::6]:53154
user_agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36
message
An Unexpected Error occurred creating the Kasm. Please contact an Administrator : Error during Create request for Server(14cc3c29-e0ac-451a-9523-d632832efdd6) : (Exception creating Kasm: Traceback (most recent call last):
  File "docker/api/client.py", line 265, in _raise_for_status
  File "requests/models.py", line 1021, in raise_for_status
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: http+docker://localhost/v1.47/containers/883200152b997a3fe0c8a591acdc42c095991aef37361648917e284dae404261/exec

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "__init__.py", line 574, in post
  File "provision.py", line 1943, in provision
  File "docker/models/containers.py", line 205, in exec_run
  File "docker/utils/decorators.py", line 19, in wrapped
  File "docker/api/exec_api.py", line 79, in exec_create
  File "docker/api/client.py", line 271, in _result
  File "docker/api/client.py", line 267, in _raise_for_status
  File "docker/errors.py", line 39, in create_api_error_from_http_exception
docker.errors.APIError: 409 Client Error for http+docker://localhost/v1.47/containers/883200152b997a3fe0c8a591acdc42c095991aef37361648917e284dae404261/exec: Conflict ("Container 883200152b997a3fe0c8a591acdc42c095991aef37361648917e284dae404261 is restarting, wait until the container is running")
)

To Reproduce Start Fedora 40 container.

Expected behavior I expect the container to start up.

Workspaces Version 1.126

Workspaces Installation Method Single Server

Client Browser (please complete the following information):

Workspace Server Information (please provide the output of the following commands):

lient: Docker Engine - Community
 Version:    27.3.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.17.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.7
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 10
  Running: 10
  Paused: 0
  Stopped: 0
 Images: 30
 Server Version: 27.3.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan kasmweb/sidecar:1.0 macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
 runc version: v1.1.14-0-g2c9f560
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.8.0-45-generic
 Operating System: Ubuntu 24.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 7.614GiB
 Name: workspaces
 ID: 84f43ca8-8c93-4f4f-84f8-dcb5f136f936
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
89dad3882bfc   kasmweb/proxy:1.16.0               "/docker-entrypoint.…"   8 hours ago   Up 7 hours             80/tcp, 0.0.0.0:8443->8443/tcp, :::8443->8443/tcp   kasm_proxy
a2529adeb1b2   kasmweb/rdp-https-gateway:1.16.0   "/opt/rdpgw/rdpgw"       8 hours ago   Up 7 hours (healthy)                                                       kasm_rdp_https_gateway
d8143bace463   kasmweb/rdp-gateway:1.16.0         "/start.sh"              8 hours ago   Up 7 hours (healthy)   0.0.0.0:3389->3389/tcp, :::3389->3389/tcp           kasm_rdp_gateway
aedce1a2de25   kasmweb/agent:1.16.0               "/bin/sh -c '/usr/bi…"   8 hours ago   Up 7 hours (healthy)   4444/tcp                                            kasm_agent
5e7c81e9a697   kasmweb/share:1.16.0               "/bin/sh -c '/usr/bi…"   8 hours ago   Up 7 hours (healthy)   8182/tcp                                            kasm_share
8bb11f7b8748   kasmweb/manager:1.16.0             "/usr/bin/startup.sh…"   8 hours ago   Up 7 hours (healthy)   8181/tcp                                            kasm_manager
7a3906b7df5d   kasmweb/api:1.16.0                 "/bin/sh -c '/usr/bi…"   8 hours ago   Up 7 hours (healthy)   8080/tcp                                            kasm_api
94c7ea14ba1e   kasmweb/kasm-guac:1.16.0           "/dockerentrypoint.sh"   8 hours ago   Up 7 hours (healthy)                                                       kasm_guac
1a7d6dc34a1a   redis:5-alpine                     "docker-entrypoint.s…"   8 hours ago   Up 7 hours             6379/tcp                                            kasm_redis
803bf0279fc1   postgres:14-alpine                 "docker-entrypoint.s…"   8 hours ago   Up 7 hours (healthy)   5432/tcp                                            kasm_db

Additional context Add any other context about the problem here.

luc-vocab commented 1 month ago

Hi, is there any other information that I can provide here which could help lead to a resolution ?

luc-vocab commented 3 days ago

I am still experiencing the error after upgrading to 1.16.1, though it doesn't seem to systematically happen on all images

luc-vocab commented 3 days ago

FIXED. I looked at the kasm_api container logs:

2024-11-23 02:18:59,750 [INFO] cherrypy.access.125139293186064: 172.18.0.11 - - [23/Nov/2024:02:18:59] "GET /api/admin/get_file_mapping_contents?token=eyJhbGciOiJSUzI1NiI sInR5cCI6IkpXVCJ9.eyJmaWxlX21hcF9pZCI6IjgzYjM2YTVjLWYyNjAtNGQzYi05ZjNkLTQ2ZmVjYWZlZGMxYSIsImV4cCI6MTczMjMyODYzOSwiYXV0aG9yaXphdGlvbnMiOls2MF19.daVo1zxfbEKUkdAcG5mpGRlB-65 J85VVd1c5Gp4UdBipu01isiWGnLsMvsLicR8zRnJXELIFNlYapHEcwzgQQlW8dyjHQKkLlIQWNqwu_Ij8L9-sIqlOq5YA6GyDJO9IG5IKocetOV-7LlMn5qVsFIizwN0QyeYmNQU8gTtokKMSaKCfSJ11WyIARmBiWTAUWy$ Ob4UIMr7yQDPIcExZQVHP8M1V5zJx2lTL7-Ao0Wthtk8kY1LZXliqmOr8cYjp1iAE5cgIkKLY7X1xTiNJOHeMo9sdGdivKBQVM7BqJqOM4pZAVHkOAyiBn9eUr2y9-fA24s1d_hYSKtxS9H2_DgrhRo_C-b4SMeKSP1F4WpeVq teEBUOq5CNcDWcsyYPbs123x5La88bSXYpC-av36EOOrSp78OJgr56yDDVnADfkurd-8-SIy9MvnSr_UA8V5Ec3Wcd-IsIO_ZHfvT5cjEt0c8TjuEeZOpnCjItnFT1z3r5JJID40MNSx0hFVrMkorPhm4ZHtrO3zq6sSQ2Jmyc 20i18PV7H9NOtevCNIu5EphsY9TyyI0KPJXrCmwEOX8wo2WBuNhEYDhH3bqFcBtIVl9sjo1l4umdwbbrEzLs1yrKOU9zEnUuQw7aang4Sg1m07T7Ncn6WHnRVAE-XyEjc58YniIo_QjlTlNEww HTTP/1.1" 200 170 "" "p ython-requests/2.31.0"

and it became clear to me it had to do with file mappings. I went to my user setup and removed the file mapping. Now, containers start again. I'll re-attempt my file mapping later.

j-travis commented 3 days ago

We can confirm the Fedora 40 image () seems to be working on a Ubuntu 24.04 host on a clean install of Workspace 1.16.1

Looks like you have pinpointed part of the issue is with the file mapping.

This KB may help you troubleshoot what specifically is going wrong. https://kasmweb.atlassian.net/servicedesk/customer/portal/3/article/30048276

Regarding file mapping , pay special attention to the guide when wanting to write files to the users profile: https://kasmweb.com/docs/latest/guide/file_mappings.html#user-home-profile

Also if you are overwriting scripts that are part of the entry point workflow make sure you are marking them as executable in the file mapping config