shawnanastasio / libkvmchan

An implementation of the Xen vchan API on KVM
Other
10 stars 4 forks source link

Xen implementation of `libkvmchan_client_init` blocks #20

Closed nrgaway closed 3 years ago

nrgaway commented 3 years ago

libvchan_client_init

I have had some strange issues where libvchand would fail to connect some qrexec client connections initiated from the host when the VM server connection had not been started. As I mentioned previously I had added code to qrexec-daemon.c to keep retrying the connection every second with a timeout period. After implementing qubes-gui-agent, I was getting other similar failures. I discovered that the the Xen implementation of libvchan_client_init blocks and does not support connection timeout. This is described within the main function of 'core-qrexec/daemon/qrexec.client.c'. The code uses alarm(2) as a timeout. Therefore I updated the libvchan_client_init within core-vchan-kvm to be blocking so it would behave in the expected manner. Once I implemented this change things started working and communicating.

libvchan_t *libvchan_client_init(int domain, int port) {
    /**
     * It appears the Xen implementation of `libvchan_client_init` is blocking
     * and does not support connection timeout as described within the main
     * function of 'core-qrexec/daemon/qrexec.client.c'. The code uses alarm(2)
     * as a timeout.
     */
    libvchan_t *vchan;

    vchan = libkvmchan_client_init(domain, port);
    while (!vchan) {
        usleep(1 * 1000 * 1000);
        vchan = libkvmchan_client_init(domain, port);
    }
    return vchan;
}

However I have not submitted a pull-request for this change as it does create an additional issue and was not sure if you would prefer to handle blocking within the libkvmchan code. The issue that this code creates is that it keeps trying to reconnect even if the VM is no longer running :) So it may be better to handle it within libkvmchan, or need to add code on each iteration to return if VM is not running either by communicating with Virsh or Qubes for VM status, or just confirm process id still exists for VM.

shawnanastasio commented 3 years ago

Oh, when I implemented client_init I didn't realize the Xen behavior was to block. Is the timeout user-controllable or is it just hardcoded in Xen's vchan? In any case, this probably belongs in libkvmchan's library.c implementation with select(2) or something used to implement the timeout. I can implement it or you can submit a PR, just let me know.

nrgaway commented 3 years ago

I agree it belongs in libkvmchan's library. The Xen client_init has has no timeout option as far as I know (just looked at the Xen code). The alarm I was referring to was a comment made in the Qubes core-qrexec/daemon/qrexec-client.c code which sets an alarm due to the Xen client_init function having no timeout.

Go ahead and implement it since I am still not familiar enough with the internals. Be aware that even though it is blocking it should return (false?) if the domain is no longer running. I came across that issue when implementing the above where it still attempted to connect to a domain that was no longer running.