vjmuzik / NativeEthernet

Native Ethernet library for Teensy 4.1
http://www.pjrc.com/teensy/td_libs_Ethernet.html
MIT License
62 stars 24 forks source link

Enabling mDNS causes hang, then reboot of Teensy 4.1 #23

Open playaspec opened 2 years ago

playaspec commented 2 years ago

I'm putting together a device that reads two ADCs (simultaneous sampling) and sends the values over TouchOSC. I originally had all the networking stuff worked out on the esp32, but switched to the Teensy because of more flexible SPI hardware, plus availability of the esp32 boards with onboard ethernet wasn't great so it looked like getting them wouldn't meet my deadline.

I stripped my original code down to the bare minimum.

#include <NativeEthernet.h>
//#include <NativeEthernetUdp.h>

// Extract the hardware MAC from uC itself, and stuff it into an array
uint8_t mac[6];
void teensyMAC(uint8_t *mac) {
    for(uint8_t by=0; by<2; by++) mac[by]=(HW_OCOTP_MAC1 >> ((1-by)*8)) & 0xFF;
    for(uint8_t by=0; by<4; by++) mac[by+2]=(HW_OCOTP_MAC0 >> ((3-by)*8)) & 0xFF;
    Serial.printf("MAC: %02x:%02x:%02x:%02x:%02x:%02x\n", mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
}

// Mirror the enum EthernetHardwareStatus in NativeEthernet.h so we can print a human readable value.
// Shouldn't this enum be updated to have a 'NativeEthernet" value?
char HWStatus[][14] = {"EthNoHardware",
                     "EthernetW5100",
                     "EthernetW5200",
                     "EthernetW5500"};

//// buffers for receiving and sending data. Currently unused.
//char packetBuffer[UDP_TX_PACKET_MAX_SIZE];  // buffer to hold incoming packet,
//char ReplyBuffer[] = "acknowledged";        // a string to send back
//
//// An EthernetUDP instance to let us send and receive packets over UDP
//EthernetUDP udp;
//
////IP address to send UDP data to:
//const char * udpAddress = "192.168.1.50";
const int udpPort = 8000;
//
//// Not connected yet!
//boolean connected = false;

int ledPin = 13;
int value = 0;

void setup(void) {

  Serial.begin(115200);

  pinMode(ledPin, OUTPUT);
  digitalWrite(ledPin, HIGH);

  Ethernet.setStackHeap(1024 * 64);
  Ethernet.setSocketSize(1024 * 16);
  Ethernet.setSocketNum(1);

  teensyMAC(mac);
  Ethernet.begin(mac);

  Serial.print("IP  address: ");
  Serial.println(Ethernet.localIP());
  Serial.send_now();
  Serial.println(HWStatus[Ethernet.hardwareStatus()]);
  Serial.send_now();

//  This sketch will run indefinitely with mDNS commented out, hangs then reboots when enabled.
  MDNS.begin("Teensy41", 1); //.local Domain name and number of services
  MDNS.setServiceName("Teensy41_OSC"); //Uncomment to change service name
  MDNS.addService("_osc._udp", udpPort); 

}

void loop(void) {
  // Count loop iterations, toggle LED and print value every 10 million loops as an alternative to blocking with delay()
  if((value % 10000000) == 0) {
    digitalWrite(ledPin, !digitalRead(ledPin));
    Serial.print("Loop:  ");
    Serial.println(value / 10000000);
    Serial.send_now();
    fnet_service_poll();  // Is this even necessary to call by the user? Ideally, the ethernet library should
                          // run a timer to automatically execute housekeeping tasks. Is this already done?
                          // It didn't seem to make a difference when the mDNS service was enabled.
    if (value >= 1000000000 ) value = 0;
  }

  value++;
}

The number of iterations before the hang varies, and occasionally it won't reboot at all, requiring manual intervention.

playaspec commented 2 years ago

Shortly after posing this I continued my due diligence in trying to track the problem down and may have found it. I was originally testing on a very busy university LAN. There's close to 1000 machines on this subnet, and over 50 different mDNS services being advertised, many with dozens of hosts per service. In an effort to better debug this, I direct attached the Teensy to the second ethernet on my workstation, fired up dnsmasq on that port (for DHCP and DNS I control), started Wireshark to see what was going on, and the problem went away!

I suspect that the heavy mDNS traffic eventually ran the Teensy out of memory trying to keep track of so many machines/services coming and going. I'm currently only announcing presence of my service, but may want to later add finding the client via mDNS.

Is there a way to tell the mDNS server to ignore all service types except the ones I'm interested in? Filtering/ignoring irrelevant services seems a necessity for memory constrained systems like microcontrollers. A method like subscribeService() or some such mechanism to only track services of interest is likely going to be a necessity for many situations.

vjmuzik commented 2 years ago

The mDNS code does not allocate any more memory by itself no matter how many clients try to send to it since it does not store anything about said clients. That being said, if there is a ton of network traffic it may be running out of memory in the stack, there's no way around that if it's happening before it reaches the mDNS server. As far as the stack knows everything coming into the mDNS port is valid data that you want which it very well could be. You can easily check FNET's stack size every so often in your code, if there are any memory leaks in the stack you will see it start to dwindle.

Serial.printf("FNET_Free: %d  FNET_Max: %d \n", fnet_free_mem_status(), fnet_malloc_max());
Serial.send_now();
playaspec commented 2 years ago

You can easily check FNET's stack size every so often in your code, if there are any memory leaks in the stack you will see it start to dwindle.

Memory leaks didn't seem to be the issue as memory use seemed more or less stable, but moving the Teensy to an isolate network stopped the crashing. There's quite a bit of hash on the public network fuzzing away at the FNET stack, so maybe the problem is there. Once I get out from under this deadline, I'm going to dig into it deeper by capturing traffic until it crashes, then try drip feeding the capture back until I can isolate whats causing the crash. If it's in FNET, I'll follow up with a closing comment and take the issue over there. Thanks for getting back so quickly.

natcl commented 2 years ago

I have the exact same issue, however it's at home so the network is much smaller than a University. Were you able to investigate this further ? Thanks !

natcl commented 2 years ago

Also I tested with the lines you suggested to check free memory and there doesn't seem to be a leak so it's probably blocking somewhere...

playaspec commented 2 years ago

I have not had a chance to look deeper into it, although it would be fairly easy to set up. What would be useful is a mechanism to detect the crash so I can stop Wireshark. Captures on this network get very big, very fast.