Closed nrgaway closed 3 years ago
Oh, when I implemented client_init I didn't realize the Xen behavior was to block. Is the timeout user-controllable or is it just hardcoded in Xen's vchan? In any case, this probably belongs in libkvmchan's library.c implementation with select(2) or something used to implement the timeout. I can implement it or you can submit a PR, just let me know.
I agree it belongs in libkvmchan's library. The Xen client_init has has no timeout option as far as I know (just looked at the Xen code). The alarm I was referring to was a comment made in the Qubes core-qrexec/daemon/qrexec-client.c code which sets an alarm due to the Xen client_init function having no timeout.
Go ahead and implement it since I am still not familiar enough with the internals. Be aware that even though it is blocking it should return (false?) if the domain is no longer running. I came across that issue when implementing the above where it still attempted to connect to a domain that was no longer running.
libvchan_client_init
I have had some strange issues where libvchand would fail to connect some qrexec client connections initiated from the host when the VM server connection had not been started. As I mentioned previously I had added code to
qrexec-daemon.c
to keep retrying the connection every second with a timeout period. After implementingqubes-gui-agent
, I was getting other similar failures. I discovered that the the Xen implementation oflibvchan_client_init
blocks and does not support connection timeout. This is described within the main function of 'core-qrexec/daemon/qrexec.client.c'. The code uses alarm(2) as a timeout. Therefore I updated thelibvchan_client_init
withincore-vchan-kvm
to be blocking so it would behave in the expected manner. Once I implemented this change things started working and communicating.However I have not submitted a pull-request for this change as it does create an additional issue and was not sure if you would prefer to handle blocking within the libkvmchan code. The issue that this code creates is that it keeps trying to reconnect even if the VM is no longer running :) So it may be better to handle it within libkvmchan, or need to add code on each iteration to return if VM is not running either by communicating with Virsh or Qubes for VM status, or just confirm process id still exists for VM.