Couple more Windows issues

oviano commented 2 months ago

Inside tcp_recv it can get stuck in an endless loop due to POLLHUP being set. I assume the other end has closed the connection after sending the tcp response so what happens is the recv successfully receives all the data but then loops forever as it doesn't check for POLLHIUP.

It can be fixed by adding a bit of code at the bottom like this, to treat POLLHUP as a zero byte read, which is then handled by the caller to mean "we're done".

        if (pfd.revents & POLLHUP) {
            return 0;
        }
    }

    // Timeout
    PLUM_LOG_WARN("TCP recv timeout");
    return -1;
}

client interruptions are not working for me on Windows; I found that even after it issues udp_sendto_self() on the impl->sock this wasn't triggering POLLIN for the socket inside upnp_impl_wait_response. This led to it being impossible to cleanup the library and stop the client, and I noticed it via the UPnP implementation.

It works correctly on Linux, but apparently not Windows. I found if I create a dedicated "interrupt_sock" and poll on this too inside upnp_impl_wait_response, then calling udp_sendto_self on this dedicated socket does trigger the poll to wake up.

I can only assume that after using the regular impl->sock for sending a broadcast and receiving UPnP replies, that thereafter it doesn't work with a self-send. I don't know what would cause that on the Windows platform, maybe relating to the way it handles broadcasts?

One thing - I found that the PCP code does not suffer from this issue. In this case, the impl->sock allows triggering POLLIN. The difference vs UPnP is this sock is not used for a broadcast, which perhaps reinforces my theory above.

oviano commented 2 months ago

https://github.com/paullouisageneau/libplum/pull/28 https://github.com/paullouisageneau/libplum/pull/29

oviano commented 2 months ago

client interruptions are not working for me on Windows; I found that even after it issues udp_sendto_self() on the impl->sock this wasn't triggering POLLIN for the socket inside upnp_impl_wait_response. This led to it being impossible to cleanup the library and stop the client, and I noticed it via the UPnP implementation.

It works correctly on Linux, but apparently not Windows. I found if I create a dedicated "interrupt_sock" and poll on this too inside upnp_impl_wait_response, then calling udp_sendto_self on this dedicated socket does trigger the poll to wake up.

I can only assume that after using the regular impl->sock for sending a broadcast and receiving UPnP replies, that thereafter it doesn't work with a self-send. I don't know what would cause that on the Windows platform, maybe relating to the way it handles broadcasts?

One thing - I found that the PCP code does not suffer from this issue. In this case, the impl->sock allows triggering POLLIN. The difference vs UPnP is this sock is not used for a broadcast, which perhaps reinforces my theory above.

I've investigated this further as I wasn't happy with the solution of creating a second socket; I have found that the issue seems to be caused by the explicit bind to port 1900 when creating the UDP socket. If I remove this/set it to zero, then it works correctly and udp_sendto_self now interrupts the socket on Windows. It continues to work on Linux.

I'm wondering why the explicit bind to port 1900 necessary? It looks like it is causing a bind to local port 1900 (node is NULL in getaddrinfo, AI_PASSIVE is specified). But why do we need to bind to local port 1900 if we are only interested in replies to our own broadcasts?

Still, I'm not sure why it stops the interruption on the Windows platform only.

paullouisageneau commented 2 months ago

I think there are two possibility here:

udp_get_local_addr() fails or doesn't return the correct address for the socket
Sending to the socket actually doesn't work for some reason

I'm wondering why the explicit bind to port 1900 necessary? It looks like it is causing a bind to local port 1900 (node is NULL in getaddrinfo, AI_PASSIVE is specified). But why do we need to bind to local port 1900 if we are only interested in replies to our own broadcasts?

It could be necessary to receive broadcasts from the UPnP router to indicate address change or service going away, but it is not implemented currently, only responses to requests are interpreted, the rest are dropped in upnp_idle().

If it solves the issue on Windows, I think binding to a random port by passing port 0 and disabling enable_reuseaddr would be a correct solution.

oviano commented 2 months ago

So I added some trace and udp_get_local_addr() does succeed and returns the expected 127.0.0.1:1900. The bound address was also as expected 0.0.0.0:1900.

Oddly, if I change the explicit port it binds to to anything other than 1900 then it works. Or zero works too.

So I think the send is failing (although, silently, it doesn't return an error code). I assume it's a bug/oddity on Windows relating to the socket being used for broadcasts.

oviano commented 2 months ago

enable_reuseaddr

If it solves the issue on Windows, I think binding to a random port by passing port 0 and disabling enable_reuseaddr would be a correct solution.

If I make a PR, do you want it wrapped in ifdef for Windows only?

paullouisageneau commented 2 months ago

So I added some trace and udp_get_local_addr() does succeed and returns the expected 127.0.0.1:1900. The bound address was also as expected 0.0.0.0:1900.

Oddly, if I change the explicit port it binds to to anything other than 1900 then it works. Or zero works too.

So I think the send is failing (although, silently, it doesn't return an error code). I assume it's a bug/oddity on Windows relating to the socket being used for broadcasts.

Weird indeed, maybe multicast reception prevents you from sending on loopback for some obscure reason. That wouldn't be the first quirk (not to say bug) in Windows sockets that would never be fixed.

If I make a PR, do you want it wrapped in ifdef for Windows only?

No, you can make the change for every OS!

oviano commented 2 months ago

https://github.com/paullouisageneau/libplum/pull/36

paullouisageneau / libplum

Couple more Windows issues #24