sandia-minimega / minimega

minimega
GNU General Public License v3.0
148 stars 67 forks source link

minimega: novnc unable to reconnect #1257

Closed mkunz7 closed 4 years ago

mkunz7 commented 5 years ago

I've noticed a number of times now where novnc works on a vm for a while, then is no longer able to connect to a vm. Showing a black screen saying Connecting... Other virtual machines still work.

I can still screenshot it with vm screenshot and control it with vncviewer, but novnc won't reconnect. This shows up in miniwebs output

root@server:~/minimega# bin/miniweb -console
2019/02/19 16:28:53 ERROR ws.go:34: dial tcp 127.0.1.1:38049: connect: connection refused

38049 is not a vnc port

minimega:/tmp/minimega/minimega$ .columns vnc_port vm info
host  | vnc_port
server | 44029
server | 45438

developer console shows:

Failed when connecting: Connection closed (code: 1006)
rfb.js:688:17
_fail
http://127.0.0.1:1234/novnc/core/rfb.js:688:17
RFB/<
http://127.0.0.1:1234/novnc/core/rfb.js:222:17
open/this._websocket.onclose<
http://127.0.0.1:1234/novnc/core/websock.js:225:13

When this happens novnc won't work until I kill and start the VM again.

vnc inject win1 PointerEvent,0,0,0 is unable to dial the connection too

2019/02/20 14:46:58 DEBUG command_socket.go:180: got request over socket: {vnc inject win1 PointerEvent,0,0,0  }
2019/02/20 14:46:58 DEBUG vms.go:477: applying main.(*VMs).Apply to win1
2019/02/20 14:46:58 INFO kvm.go:557: vnc shim connect: 127.0.0.1:50944 -> win1
2019/02/20 14:46:58 ERROR kvm.go:565: unable to dial vm vnc: dial unix /tmp/minimega/45/vnc: connect: no such file or directory
jcrussell commented 5 years ago

That's strange:

ERROR kvm.go:565: unable to dial vm vnc: dial unix /tmp/minimega/45/vnc: connect: no such file or directory

Did QEMU fail to create the domain socket for VNC?

jcrussell commented 5 years ago

@mkunz7: have you seen this since we rolled back novnc?

jcrussell commented 5 years ago

Possibly related: https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05455.html

djfritz commented 4 years ago

What's the status on this? @mkunz7 @jcrussell

mkunz7 commented 4 years ago

Reliably happens with osx High Sierra >2.9 qemu. I’ve noticed elsewhere but much rarer.

djfritz commented 4 years ago

So what's the plan here then? This looks like an existing bug in qemu that we can't work around - looks like maybe it was fixed in 2.9.0 but @mkunz7 says otherwise...

@mkunz7 can you debug trace or verbose log something something with qemu and see if qemu is closing the socket?

djfritz commented 4 years ago

@mkunz7 this is the only holdout for cutting 2.6. Can you comment? We could push this to 2.7 or just invalidate it as it seems like maybe it's a qemu thing. Thoughts?

mkunz7 commented 4 years ago

Checking this now. Give me a second to update the kernel and qemu and look closer at this.

mkunz7 commented 4 years ago

OSX issue is fixed on ubuntu 18.04, 5.28 kernel, qemu 4.1.0. I can't recreate it reliably. Close it for now.

jcrussell commented 4 years ago

Thanks for investigating!