vmware-archive / vsphere-storage-for-docker

vSphere Storage for Docker
https://vmware.github.io/vsphere-storage-for-docker
Apache License 2.0
251 stars 95 forks source link

TTY allocation on node breaks if vsphere storage plugin is installed #2078

Open raptaml opened 6 years ago

raptaml commented 6 years ago

If I install the latest version of vsphere-storage-for-docker via managed plugin, from that moment on TTY allocation breaks an makes remote shell access impossible. I am using:

Ubuntu 16.04 LTS: (4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux)

Docker: Server: Engine: Version: 18.03.0-ce API version: 1.37 (minimum version 1.12) Go version: go1.9.4 Git commit: 0520e24 Built: Wed Mar 21 23:08:31 2018 OS/Arch: linux/amd64 Experimental: false

ESXi VIB: esx-vmdkops-service 0.21.c420818-0.0.1 (via UM ZIP Bundle)

Docker plugin: vsphere-storage-for-docker:latest

When I remove the plugin and reboot the node, everything starts working as normal. I hve also tried vsphere-storage-for-docker:0.20 and vsphere-storage-for-docker:0.19 which show the same behaviour.

/var/log/auth.log shows: Mar 27 09:57:42 SERVERNAME sshd[2451]: error: openpty: No such file or directory Mar 27 09:57:42 SERVERNAME sshd[2487]: error: session_pty_req: session 0 alloc failed

This is totally reproducable here. Any ideas?

govint commented 6 years ago

@raptaml thanks for letting us know of this issue but can you say how you are establishing the remote shell with the VM running the plugin. The plugin is but a process on the host and its not using TTYs either, only a socket to the ESX (vSockets). Does a reboot after installing the plugin help? This may have more to do with Docker perhaps than with the plugin.

raptaml commented 6 years ago

@govint any remote shell will produce the error. I am trying to ssh into the VM from Linux but also Putty from Windows does not work. A reboot does not fix the problem. But uninstalling the plugin and then rebooting does. Even immediatly after installing the plugin, a sudo command in the same remote shell gives me: "no tty present and no askpass program specified" I have to uninstall the plugin via local shell and reboot the server then. strange...

govint commented 6 years ago

Ok, but this is exactly how we use the plugin and don't see any issue. This is what I have on Ubuntu 14.04,

$ sudo docker plugin ls [sudo] password for guest: ID NAME DESCRIPTION ENABLED 56866816829b vsphere:latest VMWare vSphere Docker Volume plugin true $ sudo docker version Client: Version: 18.03.0-ce API version: 1.37 Go version: go1.9.4 Git commit: 0520e24 Built: Wed Mar 21 23:10:22 2018 OS/Arch: linux/amd64 Experimental: false Orchestrator: swarm

Server: Engine: Version: 18.03.0-ce API version: 1.37 (minimum version 1.12) Go version: go1.9.4 Git commit: 0520e24 Built: Wed Mar 21 23:08:52 2018 OS/Arch: linux/amd64 Experimental: false $ id uid=1001(guest) gid=1001(guest) groups=1001(guest),27(sudo)

I'm not able to see this behavior reported elsewhere either. Let me check

johnbaker commented 6 years ago

I have the same behavior on a fully updated Ubuntu server 16.04 install. As soon as I install the vsphere storage plugin, I experience the same behavior. After trying a few other plugins, I have found this isn't the only plugin that causes the behavior. For example, the Pure storage plugin (store/purestorage) also fails.

Not all storage plugins cause the issue, for instance, I can use the SSHFS plugin (vieux/sshfs)

bartlenaerts commented 6 years ago

I have the same problem on a freshly installed Centos7 server. I installed the latest version of docker (18.03.0-ce, build 0520e24), created a swarm with different nodes. No problems so far, I can still connect with SSH to the server. As soon as the vSphere plugin is installed, a connection to the server isn't possible anymore with SSH. Rebooting doesn't help. Only deinstalling of the plugin and then rebooting works.

govint commented 6 years ago

Let me try the same config. With Ubuntu we have no issues with using the plugin. I haven't seen this with Alpine or even Centos with earlier docker versions either. Let me recheck and post.

grekier commented 6 years ago

Same problem here on Ubuntu 16.04.4 LTS with kernel 4.4.0-119 I also noticed in the plugin log that there is a recurrent error: Failed to get volume meta-data name=XXX error="Volume XXX not found (file: /vmfs/volumes/MAIN-RAID5/dockvols/_DEFAULT/XXX.vmdk)" Also worth mentioning is that I have a swarm with 3 master nodes Don't know if it helps but thought I would add the info. Same version of docker and plugin as above

raptaml commented 6 years ago

@govint Do you have any update for us? Should we file a bug with the docker team or what would you suggest?

ghost commented 6 years ago

Good Day,

I have the same issue and it has to do with devpts. When installing plugins a second devpts is mounted. The below is the varying options of the mounts :

rw,relatime,mode=600,ptxmode=000 rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666

The first works no issues, with the plugins the second option comes into play and that is when logins do not work unless using the console.

From the console I umount both devpts and the mount as follows :

mount devpts /dev/pts -t devpts

Regards

grekier commented 6 years ago

I see almost the same as @Eireocean devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666) second line inserted after plugin install. Side note is that actually removing the last one fix the issue for me.

Not sure if it helps but it seems that something similar happened earlier in systemd (https://github.com/systemd/systemd/issues/337)

liqdfire commented 6 years ago

I have the same issue with docker 18.03

I removed it, and re-installed 17.12.1-ce and had no issue re-installing the volume plugin.

SaintMartin commented 6 years ago

I also ran into the same problem. I was running a 6-node swarm (3 managers) on: Docker Version: 18.03.0-ce

Kernel Version: 3.10.0-693.21.1.el7.x86_64 Operating System: CentOS Linux 7 (Core) On ESXi 6.0 (VM version 11)

ESXi VIB: VMWare_bootbank_esx-vmdkops-service_0.21.c420818-0.0.1.vib Docker plugin: vsphere-storage-for-docker:latest

On day, when I attempted to ssh into any of the nodes I got: “PTY allocation request failed on channel 0” The solution was to login to the VMWare console, remove the volumes, disable the plugin, and reboot the VM.

BTW, the problem started around March 26. According to /var/log/vmware-vmsvc.1.log: [Mar 26 13:42:59.026] [ warning] [guestinfo] GetDiskInfo: ERROR: Partition name buffer too small [Mar 26 13:42:59.026] [ warning] [guestinfo] Failed to get disk info. [Mar 26 13:43:29.024] [ warning] [guestinfo] GetDiskInfo: ERROR: Partition name buffer too small [Mar 26 13:43:29.024] [ warning] [guestinfo] Failed to get disk info. [Mar 26 13:43:59.024] [ warning] [guestinfo] GetDiskInfo: ERROR: Partition name buffer too small [Mar 26 13:43:59.024] [ warning] [guestinfo] Failed to get disk info. [Mar 26 13:44:29.024] [ warning] [guestinfo] GetDiskInfo: ERROR: Partition name buffer too small [Mar 26 13:44:29.024] [ warning] [guestinfo] Failed to get disk info. [Mar 26 13:44:59.025] [ warning] [guestinfo] GetDiskInfo: ERROR: Partition name buffer too small [Mar 26 13:44:59.025] [ warning] [guestinfo] Failed to get disk info. And so on

I hope that helps.

grekier commented 6 years ago

Seems like upgrade to docker 18.03.1-ce is fixing this issue. At least I can SSH to my docker servers now with kernel 4.4.0-124