radarsat1 / liblo

liblo is an implementation of the Open Sound Control protocol for POSIX systems
GNU Lesser General Public License v2.1
192 stars 60 forks source link

cpp send_from segfault #101

Closed rbarreiros closed 4 years ago

rbarreiros commented 4 years ago

Hello,

I've been at this for 2 days, and I'm really at a loss here, without delving into the lib inards, I won't be able to figure this out, meybe it's actualy a bug or it's me being dumb...

I have the following code, it's an example just to cause the segfault, trying to keep it simple

#include <iostream>
#include <lo/lo.h>
#include <lo/lo_cpp.h>

lo::Address toB("192.168.1.60", 10023);

int AtoB_generic_handler(const char *path, const char *types, lo_arg **argv,
                          int argc, lo_message m, void *data)
{
    std::cout << "Local Got message " << path << std::endl;
    std::cout << "Local Msg ";
    lo_message_pp(m);

    lo::ServerThread *server = reinterpret_cast<lo::ServerThread*>(data);
    toB.send_from(server, path, m);

    return 1; 
}

int main(int, char**)
{
    lo::ServerThread AtoB(10023);
    lo::ServerThread BtoA(11023);

    if(!AtoB.is_valid())
    {
        std::cout << "AtoB Invalid" << std::endl;
        return 1;
    }

    if(!BtoA.is_valid())
    {
        std::cout << "BtoA Invalid" << std::endl;
        return 1;
    }

    AtoB.add_method(NULL, NULL, AtoB_generic_handler, &BtoA);

    AtoB.start();

    while(1){}
}

I tried several ways of handling the send_from, even built a queue, fill the queue in the callback, and process the queue and send_from in main loop but the result is the same, there's a segfault in send_data as per the backtrace below

Thread 5 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 6168.0x1b1c]
0x0000000000405bc4 in send_data (a=0x7d1ee0, from=0x6bfd00, data=0x6d2da0 "/renew", data_len=24) at .... \ext\liblo\src\send.c:474
474             sock = from->sockets[0].fd;
(gdb) bt
#0  0x0000000000405bc4 in send_data (a=0x7d1ee0, from=0x6bfd00, data=0x6d2da0 "/renew", data_len=24) at .... \ext\liblo\src\send.c:474
#1  0x0000000000405fed in lo_send_message_from (a=0x7d1ee0, from=0x6bfd00, path=0x7d1ab0 "/renew", msg=0x6d2bb0) at .... \ext\liblo\src\send.c:576
#2  0x000000000040f3f7 in lo::Address::send_from (this=0x418030 <to>, from=0x6bfd00, path=..., m=0x6d2bb0) at .... /ext/liblo/lo/lo_cpp.h:258
#3  0x00000000004015fe in AtoB_generic_handler (path=0x7d1ab0 "/renew", types=0x6d2c31 "s", argv=0x6d2cb0, argc=1, m=0x6d2bb0, data=0x6bfd00) at .... \main.cpp:15
#4  0x0000000000409eba in dispatch_method (s=0x6d24b0, path=0x7d1ab0 "/renew", msg=0x6d2bb0, sock=-1) at .... \ext\liblo\src\server.c:1942
#5  0x0000000000409bb8 in dispatch_data (s=0x6d24b0, data=0x7d1ab0, size=24, sock=-1) at .... \ext\liblo\src\server.c:1866
#6  0x000000000040937e in lo_server_recv (s=0x6d24b0) at .... \ext\liblo\src\server.c:1693
#7  0x0000000000408f90 in lo_servers_recv_noblock (s=0x294fe90, recvd=0x294fe7c, num_servers=1, timeout=10) at ... \ext\liblo\src\server.c:1577
#8  0x0000000000408fec in lo_server_recv_noblock (s=0x6d24b0, timeout=10) at... \ext\liblo\src\server.c:1587
#9  0x000000000040b62d in thread_func (data=0x7d2010) at .... \ext\liblo\src\server_thread.c:278
#10 0x00007ff8c30bb04a in msvcrt!_beginthreadex () from C:\WINDOWS\System32\msvcrt.dll
#11 0x00007ff8c30bb11c in msvcrt!_endthreadex () from C:\WINDOWS\System32\msvcrt.dll
#12 0x00007ff8c4207bd4 in KERNEL32!BaseThreadInitThunk () from C:\WINDOWS\System32\kernel32.dll
#13 0x00007ff8c4ecce51 in ntdll!RtlUserThreadStart () from C:\WINDOWS\SYSTEM32\ntdll.dll
#14 0x0000000000000000 in ?? ()

Im using mingw (64bit) on windows, will try in a linux machine and see if the problem remains, any pointers on how to proceed to fix this ?

Regards.

rbarreiros commented 4 years ago

Same code crashes in linux

Thread 2 "XControl_Proxy" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7a43700 (LWP 9162)]
send_data (a=0x55555558aee0, from=0x7fffffffe300, data=0x7ffff0000ca0 "/xinfo", data_len=12) at .... /ext/liblo/src/send.c:474
474             sock = from->sockets[0].fd;
(gdb) bt
#0  send_data (a=0x55555558aee0, from=0x7fffffffe300, data=0x7ffff0000ca0 "/xinfo", data_len=12) at ... /ext/liblo/src/send.c:474
#1  0x0000555555569ec5 in lo_send_message_from (a=0x55555558aee0, from=0x7fffffffe300, path=0x7ffff0010b70 "/xinfo", msg=0x7ffff0000b60) at ... /ext/liblo/src/send.c:576
#2  0x00005555555633d9 in lo::Address::send_from (this=0x555555578170 <to>, from=0x7fffffffe300, path=..., m=0x7ffff0000b60) at ... /ext/liblo/lo/lo_cpp.h:258
#3  0x0000555555562c6a in AtoB_generic_handler (path=0x7ffff0010b70 "/xinfo", types=0x7ffff0000bc1 "", argv=0x0, argc=0, m=0x7ffff0000b60, data=0x7fffffffe300) at ... /main.cpp:15
#4  0x000055555556e06c in dispatch_method (s=0x55555558afa0, path=0x7ffff0010b70 "/xinfo", msg=0x7ffff0000b60, sock=-1) at ... /ext/liblo/src/server.c:1942
#5  0x000055555556dcc9 in dispatch_data (s=0x55555558afa0, data=0x7ffff0010b70, size=12, sock=-1) at ... /ext/liblo/src/server.c:1866
#6  0x000055555556d471 in lo_server_recv (s=0x55555558afa0) at ... /ext/liblo/src/server.c:1693
#7  0x000055555556d143 in lo_servers_recv_noblock (s=0x7ffff7a42ea8, recvd=0x7ffff7a42eb4, num_servers=1, timeout=10) at ... /ext/liblo/src/server.c:1577
#8  0x000055555556d1b0 in lo_server_recv_noblock (s=0x55555558afa0, timeout=10) at ... /ext/liblo/src/server.c:1587
#9  0x000055555556f857 in thread_func (data=0x55555558b2a0) at ... /ext/liblo/src/server_thread.c:278
#10 0x00007ffff7fa3609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007ffff7b7f103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
radarsat1 commented 4 years ago

Hm, so this seems to be a weird C++ "type level" bug, if you want. What's happening is that there are overrides for send_from that take lo_server, so what should ideally happen is that the lo_server() operator of lo::Server (from which lo::ServerThread is derived) should be called to get a lo_server.

But, since on the C level, both lo_server and lo_server_thread are typedef'd as void*, the C++ compiler finds the more direct route of calling lo::ServerThread::lo_server_thread(). It then dispatches the send_data(lo_server, ..), erroneously passing the lo_server_thread as the lo_server.

(By the way you should be passing your lo::ServerThread to send_data by reference, not by pointer as you have it here.)

From looking at lo_cpp.h, I tried to handle this by providing a specific override for lo::ServerThread&. But, even if I fix your code to pass in a lo::ServerThread&, it's not getting called, instead choosing to always call the variant that takes lo_server, and I don't know why.

One fix might be to redefine lo_server and lo_server_thread to be pointers to empty, distinct structs, instead of to void*, so that the compiler doesn't see them as the same type. Theoretically that shouldn't break existing code? I'll have to give it a try.

radarsat1 commented 4 years ago

In the meantime, this seems to work:

lo::ServerThread& server = *reinterpret_cast<lo::ServerThread*>(data);
toB.send_from(lo_server_thread_get_server(server), path, m);
radarsat1 commented 4 years ago

I took the leap and redefined the liblo types to fix this problem: 2c1ef1c682e01aabec511223b52d9d845063d6dc

Maybe this will warrant a major version bump. Let's see if people report incompatibilities.

peternewman commented 7 months ago

Maybe this will warrant a major version bump. Let's see if people report incompatibilities.

Just FYI, unfortunately this does break compatibility with our existing code @radarsat1 : https://github.com/OpenLightingProject/ola/issues/1949