jasta / esp32-tokio-demo

Demo of tokio running on esp32 using esp-idf
MIT License
33 stars 5 forks source link

Possible Memory Leak with tokio::TcpStream #2

Open sampaioletti opened 1 year ago

sampaioletti commented 1 year ago

Thanks for the work you've done on bringing tokio to the esp I think its a great scenario, its the closest we have been to successfully sharing our codebase from other platforms.

I'm trying to figure out how to narrow this down but tokio tcp streams seem to cause a memory leak of around 250b that I have been unable to duplicate on other platforms. It appears to be when a connection never happens.

// this section leaks on the heap
loop {
      log::info!(
          "Stack High Water Mark {} Heap {}",
          unsafe { esp_idf_sys::uxTaskGetStackHighWaterMark2(std::ptr::null_mut()) },//no leak
          unsafe {
              esp_idf_sys::heap_caps_get_free_size(esp_idf_sys::MALLOC_CAP_8BIT) //leaks about 250b
          }
      );
      let conn = tokio::net::TcpStream::connect("192.168.1.10:12345").await; //this endpoint doesn't exist so connection will fail

      match conn{
          Ok(_) => {}
          Err(_) => {}
      }
      tokio::time::sleep(tokio::time::Duration::from_secs(5)).await;
  }

// this non async version does not
loop {
    log::info!(
        "Stack High Water Mark {} Heap {}",
        unsafe {
            esp_idf_sys::uxTaskGetStackHighWaterMark2(std::ptr::null_mut())
        },
        unsafe {
            esp_idf_sys::heap_caps_get_free_size(esp_idf_sys::MALLOC_CAP_8BIT)
        } //stays consistent
    );

    let conn = std::net::TcpStream::connect("192.168.1.10:12345"); //this endpoint doesn't exist so connection will fail

    match conn {
        Ok(_) => {}
        Err(_) => {}
    }
    std::thread::sleep(std::time::Duration::from_secs(5));
}

I've tried playing with time to see if the connections weren't being released, but it never stabilized it will decrease until it runs out of memory, and the sync version stabilizes after the first allocation.

Any thoughts on what might cause this, or how i could figure out what might be causing it. I'm not a esp expert (yet) so i'm still trying to learn the tooling.

Thanks!

jasta commented 1 year ago

Great catch! That's the kind of thing I was really hoping we could find before moving to formal support (i.e. removing the experimental mio flag). That said, I can't think of any reason why the poll implementation I created would do this, there's no unsafe code or anything particular fancy going on. Hmm, a few things come to mind to help narrow it down though:

  1. Check that UdpSocket does the same (i.e. send_to to a non-routable address or closed port).
  2. Confirm that TcpSocket.connect to a working address does NOT leak.
  3. Check that TcpListener.accept might do the same (accept a connection then close it immediately). For this you can test with a host machine running: while :; do echo foo | nc espressif 12345; sleep 1; done to just spam it with connections.

If indeed none of those scenarios produce unexpected results then I'd start looking into the Tokio TcpStream error path to see if there's anything suspicious going on like a mem::forget call that sounds pretty hand waivey or is host specific in a way that wouldn't match on #[cfg(unix)] (esp32 is considered a UNIX implementation). You might start your journey down that path here: https://github.com/tokio-rs/tokio/blob/master/tokio/src/net/tcp/socket.rs#L653

Personally I'd probably get a debugger set up and step through each line in tokio until you get to the error path. I'd have a separate thread calling the print heap function call you're using and just watching it carefully as you go. It's a bit of a pain, you've gotta get a USB cable that has break out jumper cables that you can connect to the USB_D-/D+ pins, then set up openocd (at least that's what I used in the past). You also have to disable the watchdog timer which will likely fire as you're stepping through. See here for more details: https://esp-rs.github.io/book/tooling/debugging/openocd.html. Also note that I find the ESP32-C3 is way easier to use because the RISC-V support is better than Xtensa across various toolchains but you should be able to get any chip working. It's a good investment if you're new to embedded because it can be really tempting to just never use the debugger and then spend days and days tearing your hair out trying to fix simple bugs :)

jasta commented 1 year ago

Oh and with verbose logging you should see the console spamming each select fd set so you can check for an easy one like I am not removing the failed fd from the select set. If the set just keeps getting bigger and bigger then it was definitely my fault :P

sampaioletti commented 1 year ago

Great I'll see what I can do. As I mentioned it doesn't happen on other unix instances it seems to be specific to the esp. I'll see what I can find tomorrow. I have all the hardware to setup a debugger just haven't gotten to that point yet as any bug in our code has been easier to find on other platforms (: we are just starting to port to esp.

Thanks again for your help!

-Sam


From: Josh Guilfoyle @.> Sent: Wednesday, October 11, 2023 5:35:32 PM To: jasta/esp32-tokio-demo @.> Cc: sam @.>; Author @.> Subject: Re: [jasta/esp32-tokio-demo] Possible Memory Leak with tokio::TcpStream (Issue #2)

Oh and with verbose logging you should see the console spamming each select fd set so you can check for an easy one like I am not removing the failed fd from the select set. If the set just keeps getting bigger and bigger then it was definitely my fault :P

— Reply to this email directly, view it on GitHubhttps://github.com/jasta/esp32-tokio-demo/issues/2#issuecomment-1758694610, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACDGQ5573HEMZYLC6P4XG2TX64UMJANCNFSM6AAAAAA54WNFUE. You are receiving this because you authored the thread.Message ID: @.***>

sampaioletti commented 1 year ago

I setup the debugger and have yet to be able to figure out where its happening. I'll have to keep practicing with it until i get better...not quite as easy to debug as i'm used to (:

If the socket connects and is then subsequently dropped it doesn't leak so its something specific to attempting to connect not being freed on failure, just haven't figured out what.

Bind doesn't leak either.

I did try to run the code as identically as possible with valgrind on wsl and had no memory leaks

sampaioletti commented 1 year ago

tokio::net::UnixStream and tokio::net::UdpSocket dont leak

sampaioletti commented 9 months ago

I haven't given up on this issue, and I'm still looking into it, but if anyone comes across this I was able to work around it by creating a std::net::TcpStream, setting the non_blocking to true, and using the tokio::net::TcpStream::from_std method upon success.

This really makes me think that it is not necessarily due to the tokio runtime implementation but something about the way it creates/drops the underlaying lwIP connection in an async context...maybe there is some reliance on a drop or something that doesn't have an async equivalent. So not giving up..but the workaround is keeping my project moving until I can circle back and figure this out.