sandia-minimega / minimega

minimega
GNU General Public License v3.0
148 stars 66 forks source link

miniccc: attempt to reconnect to serial cc on disconnect #1436

Closed activeshadow closed 3 years ago

activeshadow commented 3 years ago

miniccc clients currently fail to reconnect if the miniccc process is restarted (for example, when a VM is rebooted). This PR addresses the issue.

activeshadow commented 3 years ago

FYI, just tested this with Windows and it doesn't work (I had only tested with Linux originally). Seems the custom virtioPort io.ReadWriteCloser for Windows doesn't return an EOF like the Linux implementation does.

activeshadow commented 3 years ago

This now works for both Windows and Linux VMs.

activeshadow commented 3 years ago

@activeshadow LGTM some observations when issuing commands on both win10 and ubuntu18 and the machines restart the connections are closed but since the commands are still in queue it attempts to issue the commands resulting in following error: "read unix @->/tmp/minimega/0/cc: use of closed network connection" when machines comes back the error goes away. can you add a patch so that it doesn't attempt use a closed connection?

@aherna I assume it's the ron server printing this error message, and not the miniccc client? Can you provide enough detail so I can reproduce the error message in order to make sure I address it properly?

activeshadow commented 3 years ago

@aherna can you confirm it's the ron server printing this error message?

Also, do the commands being sent when this error is printed get lost, or do they get queued up again since they weren't actually sent?

activeshadow commented 3 years ago

@aherna I was able to reproduce and answer my questions above.

aherna commented 3 years ago

@aherna can you confirm it's the ron server printing this error message?

Also, do the commands being sent when this error is printed get lost, or do they get queued up again since they weren't actually sent?

@activeshadow Im 90% sure that its server.go in src/ron. The only other place that shows up is in conatiners.go, minitunnel.go, and kvm.go. src/minimega/kvm.go would be my next logical thought of where it was throwing this error.

The commands stay queued they dont get lost, unless user issues clear cc commands.

activeshadow commented 3 years ago

@aherna I don't think this error is hurting anything, since commands don't get lost.

The error is being returned by the call to c.dec.Decode in ron/server.go here. c.dec is a Gob decoder tied to the unix socket for the client's virtual serial port, so it returns this error when the unix socket is closed when we detect the VSERPORT_CHANGE event when the VM reboots.

Looking at my logs, and based on how the code was before I added the VSERPORT_CHANGE stuff, it looks like this was happening even before I made the changes in this PR.

There's a few ways to fix it... one is to do what's currently being done at the end of the clientHandler function now and just not print the error if it's this read unix error. It's a common approach to block on read and handle these types of errors like this. The other would be to have another for loop that checks to see if the connection is closed or if data is ready via something like bufio.Peek before moving on to c.dec.Decode but this seems unnecessary.

Thoughts? Would be great to get this merged in!