ssilverman / QNEthernet

An lwIP-based Ethernet library for Teensy 4.1 and possibly some other platforms
GNU Affero General Public License v3.0
86 stars 24 forks source link

Connecting to a client that is also being served by EthernetServer leads to crash #24

Closed sparks-alec closed 1 year ago

sparks-alec commented 1 year ago

Hi! This library is really fantastic, I suspect I'm doing something unusual or wrong.

My Teensy hosts an EthernetServer web server on port 80. I also wish to connect my Teensy to an app on a Mac via a particular network port. I don't always know if the app is running.

The Teensy periodically attempts to .connect() to the application on port 3037. If the IP of the web client and the IP of the app Mac are the same (ie, the Mac loaded the web server from a browser) and the application is not running (so the port isn't open) the Teensy hangs on the next .connect().

The issue does not happen if .connect() is successful, it only happens if the app isn't running. It also does not happen if the web page is loaded from a separate Mac.

I've attached a stripped-down demonstration sketch. If I un-comment out the .stop() on line 54, the hang only happens if the web server is accessed during .connect()'s timeout.

Steps to reproduce:

tcptest.ino.zip

Thanks for any help, and let me know if I can clarify. It's tricky to explain!

ssilverman commented 1 year ago

Thanks for using QNEthernet. I have some comments and notes.

  1. In the onAddressChanged() listener, I would use Serial.println(Ethernet.localIP()); instead. In my opinion, that's better practice.
  2. I'm not convinced the delay(100) is necessary after calling Ethernet.begin().
  3. Use webClient.close() instead of closeOutput(). closeOutput() is for doing a half-close, and I don't think that's what you intend to do here. (Note that close() is almost the same as stop(), the difference being it doesn't wait (until close or a timeout) for the close to complete.)

EthernetClient::connect() is a blocking operation, and so may take some time to complete. Internally, if there's already a connection open for a specific object (in this case, you're always using client1), it will first call stop(), and this blocks (until close or a timeout), so I just changed the library (not yet pushed) to call close() instead, so there's no blocking. In addition, it blocks until a connection is made or a timeout.

Currently, there's no support for non-blocking connect(), but I'm pondering a good way to add that.

I'd change your code to either use new connection objects, or first call client1.close() when you no longer need the connection, before attempting to reconnect.

ssilverman commented 1 year ago

Just following up. Did my notes help?

sparks-alec commented 1 year ago

Apologies, got distracted by another project. You helped a ton- thank you!

ssilverman commented 1 year ago

Would you consider this issue closed? If not, happy to try to answer other related questions.

sparks-alec commented 1 year ago

I've made your suggested changes and fiddled around a bunch, but it still seems like something doesn't like it if another device attempts to connect to the Teensy while it itself is trying to connect().

This leads to a total hault on the Teensy, not just waiting for a timeout. I stop receiving Serial from it. Is that just how .connect() has to work since it blocks?

Here's my code (updated with your suggestions) and a screenshot of the Serial monitor showing where Serial output stops. tcptest.zip

Screen Shot 2023-01-10 at 12 55 31 AM

ssilverman commented 1 year ago

I'm looking more at this now. Which .ino file did you want me to try? The one inside the zip file, or the one inside the zip file inside the zip file?

Update: I see which is the later one: the one directly inside the top-layer zip.

ssilverman commented 1 year ago

In looking at these files, you're calling Ethernet.begin twice. Did you want to use DHCP or a static IP? If DHCP, then call Ethernet.begin(). If a static IP, call Ethernet.begin(ip, netmask, gateway[, dns]).

ssilverman commented 1 year ago

I've been playing more with code as close to your code as possible and can't see any failure. I changed these things:

  1. The static IP address/netmask/gateway for my network
  2. Changed the client to try to connect to port 80 to a machine on my local network that has an HTTP server
  3. Using if (Ethernet.begin(IP_Static, IP_Subnet, IP_Gateway)) instead of if(Ethernet.begin())
  4. Removed the second begin call

Here's the new code after the "setup" print:

  if(Ethernet.begin(IP_Static, IP_Subnet, IP_Gateway)){
    Serial.println("starting server 1 on port 80");
    if (!server1.begin(80)) {
      Serial.println("Failed to start server!");
    }
  } else {
    Serial.println("Failed to start Ethernet!");
  }

Here's the new client connect code. Note that I'm printing the connect() result instead of the string "gave up":

  if(millis()%3000==0){
    EthernetClient client1;
    IP_Dest = IPAddress{192, 168, 1, 1};
    Serial.print("attempting to connect to port 80 on ");
    Serial.print(IP_Dest);
    Serial.print(": ");
    Serial.println(client1.connect(IP_Dest, 80));
    client1.close();
    // Serial.println(" ...gave up");
  }else{

The connect() return values are documented here: https://www.arduino.cc/reference/en/libraries/ethernet/client.connect/

I can connect to the Teensy as many times as I want, and in addition, the Teensy is successfully connecting to my HTTP server.

Which version of QNEthernet are you using? I just tested your code with v0.17.0 and also my latest internal version with some fixes, and both work. (Update: All my latest changes have been pushed to master.)

ssilverman commented 1 year ago

I just pushed some changes that may fix the issue, specifically commit 7724fbe68ace503e2e239546c9b9c935f9cd2f1c. See the latest master.

sparks-alec commented 1 year ago

This fixed my problem! Thank you very very much for your help-this is a really fantastic library.

sparks-alec commented 1 year ago

ack, I was wrong. Your code works great on my network unless IP_Dest is the same as webClient.remoteIP();

If they're the same, and I refresh the webpage a few times, the teensy appears to hang indefinitely.

My testing is specifically targeting a server that maybe doesn't exist- I'm curious if in your testing, you can get the issue to happen if your HTTP server on port 80 isn't running.

ssilverman commented 1 year ago

I can see the crash and I'm seeing one of these two asserts:

  1. "tcp_input: pcb->next != pcb (before cache)"
  2. "tcp_input: TIME-WAIT pcb->state == TIME-WAIT"

These are both in lwIP's _tcpin.c. This often points to a concurrency (threading, interrupts, or whatever) problem, but:

  1. Nothing in the network stack is being called from an interrupt context, and
  2. There are no threads.

The checks:

  1. NO_SYS is 1
  2. I've implemented LWIP_ASSERT_CORE_LOCKED() with code that checks for being inside an interrupt. That function asserts if it is, and there's no such asserts when running.

I'm still looking...

ssilverman commented 1 year ago

@sparks-alec does this latest commit help: b69228f5a72ebf960f04f4d47d620b3741c47387? It seems to help on my end.

sparks-alec commented 1 year ago

It helps on my end too. I think it's fixed. Thank you!

(my specific projectis a custom controller for ETC Eos lighting consoles, which I now realize is in your realm of expertise!)