ufo-kit / uca-phantom

A libuca plugin for the phantom high speed cameras
0 stars 0 forks source link

uca_phantom_get_variable freezes after too many immediate consecutive calls #7

Closed gabs1234 closed 11 months ago

gabs1234 commented 1 year ago

Problem description

Behaviour is undefined when doing the following:

    GError *error = NULL;
    GValue val = G_VALUE_INIT;
    gboolean result = FALSE;

    UcaPhantomCommunicate *communicator = uca_phantom_communicate_new();
    gboolean connected = uca_phantom_communicate_attempt_connect(communicator, &error);

    if (!connected) {
        g_print ("Houston theres a problem: %s\n", error->message);
        g_error_free (error);
        g_object_unref (communicator);
        return FALSE;
    }

    struct timespec req, rem;
    req.tv_sec = 0;
    req.tv_nsec = 5e+6;

    for (int i=0; i < N_UNIT_PROPERTIES; i++) {
        result = uca_phantom_get_variable (communicator, i, &val, &error);
        if (!result && error != NULL) {
            g_print ("Houston we've got a problem: %s\n", error->message);
            g_error_free (error);
            g_object_unref (communicator);
            return FALSE;
        }
        print_gvalue (i, &val);
        g_value_unset (&val);
        // nanosleep(&req, &rem);
    }

    g_object_unref (communicator);

Indeed, after a certain and seemingly random amount of requests, the program feezes. However, by adding a blocking sleep command at the end of the loop, all requests are made and all answers are received.

Valgrind output

[...]
==45048== Process terminating with default action of signal 2 (SIGINT)
==45048==    at 0x4D01D7F: poll (poll.c:29)
==45048==    by 0x4AB12AD: g_socket_condition_timed_wait (in /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.7200.4)
==45048==    by 0x4AB161E: ??? (in /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.7200.4)
==45048==    by 0x4A8F1F6: g_input_stream_read (in /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.7200.4)
==45048==    by 0x10CDF9: uca_phantom_communicate (uca-phantom-communicate.c:517)
==45048==    by 0x10CDF9: uca_phantom_get_variable (uca-phantom-communicate.c:670)
==45048==    by 0x10B9B4: main (test_network.c:51)
==45048== 
==45048== HEAP SUMMARY:
==45048==     in use at exit: 278,549 bytes in 2,329 blocks
==45048==   total heap usage: 11,650 allocs, 9,321 frees, 9,845,914 bytes allocated
==45048== 
==45048== LEAK SUMMARY:
==45048==    definitely lost: 14 bytes in 1 blocks
==45048==    indirectly lost: 0 bytes in 0 blocks
==45048==      possibly lost: 3,216 bytes in 4 blocks
==45048==    still reachable: 250,999 bytes in 2,089 blocks
==45048==         suppressed: 0 bytes in 0 blocks
==45048== Rerun with --leak-check=full to see details of leaked memory
==45048== 
==45048== For lists of detected and suppressed errors, rerun with: -s
==45048== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

The memory leak is likely due to the communicator object not being freed after the ctl+c signal.

gabs1234 commented 1 year ago

sizeof(communicator) gives 8. So there's actually some memory leaking going on. With --leak-check=full, I find:

==45885== HEAP SUMMARY:
==45885==     in use at exit: 278,165 bytes in 2,329 blocks
==45885==   total heap usage: 6,583 allocs, 4,254 frees, 2,483,386 bytes allocated
==45885== 
==45885== 14 bytes in 1 blocks are definitely lost in loss record 139 of 1,927
==45885==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==45885==    by 0x4935738: g_malloc (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7200.4)
==45885==    by 0x494A583: g_strdup (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7200.4)
==45885==    by 0x4A8581C: g_inet_address_to_string (in /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.7200.4)
==45885==    by 0x10C70E: uca_phantom_communicate_discover (uca-phantom-communicate.c:456)
==45885==    by 0x10C70E: uca_phantom_communicate_attempt_connect (uca-phantom-communicate.c:565)
==45885==    by 0x10B982: main (test_network.c:37)

So a malloc isn't being free'd before I ctrl+c.

gabs1234 commented 1 year ago

So actually I introduced the memroy leak by letting github copilot malloc something while forgetting to free it... Darn AI.

That being said, the main issue persists.

gabs1234 commented 1 year ago

Probable cause

I might simply be overwhelming the IO capacity of the phantom camera (or the PC ? seems unlikely).

This would explain why the only solution to the problem is adding some waiting time between each call.

Solutions:

  1. throttling of data stream (i.e. reduce the frequency)
  2. use async methods
    • write_all_async
    • or even buffering (new_thread + queue)
gabs1234 commented 11 months ago

Issue was fixed in cad4a3c9719a115463e9091ebcc32df99ea30731: was caused by some trailing carriage return