njh / EtherCard

EtherCard is an IPv4 driver for the ENC28J60 chip, compatible with Arduino IDE
https://www.aelius.com/njh/ethercard/
GNU General Public License v2.0
1.03k stars 455 forks source link

Strange DHCP behaviour #45

Closed aquadat0r closed 11 years ago

aquadat0r commented 12 years ago

Hi there,

DHCP is reliable when I connect it directly to the router (DHCP server), however when i connect to via various switches it fails.

At work, with enterprise switches that are daisy chained, it also fails at obtaining an IP via DHCP. Any ideas why this would be happening?

My PC's all work fine. It works 100% when using the NanodeUIP library.

Do I need to enableBroadcast?

Thanks John

vicatcu commented 12 years ago

I can attest to flakiness of DHCP in "enterprise network" situations (i.e. universities, public buildings) where other libraries and wiznet based devices seem to be able to cope... would be excellent to get to the bottom of this. I have some captured wireshark data that I can share if it helps.

AndrewFischer commented 12 years ago

Me too. Appears to be worse in the current version. I've tried increasing the timeout to 3 minutes but that hasn't helped.

thiseldo commented 12 years ago

Apologies if no-one has relied yet, not had chance to look at this, but in recent experiences I have seen packets get missed for some reason. In particular I was seeing occasional http responses being missed or lost, never got to the bottom of it.

I've seen that some dhcp servers respond with broadcast mac addresses and if the library is filtering broadcast messages then the responses wont be seen. I did have code in the original EtherShield library to alter the broadcast mac filter to allow responses this way when waiting for dhcp responses.

If you're also using it on an "enterprise network" then there could be a large volume of packets that may potentially swamp the ENC28J60 causing packets to be missed if its buffers get full. I'd need to check the spec to see what happens in this situation.

Hope this helps for now.

Andy

thiseldo commented 12 years ago

Looking through the code, I cant see anything odd. The setting of the bits to enable boradcast look correct after working through the flow diagram in the datasheet. Cant actually find any informationin the datasheet as to what happens to packets once the head of the circular buffer meets the tail leaving no space left for new packets to be received.

Still looking though.

Andy

vicatcu commented 12 years ago

Presumably that would result in the RXERIF flag being set... See section 12.1.2 of the ENC28J60 datasheet. Perhaps checking that flag would be a means of verifying your theory? On Sep 19, 2012 10:14 AM, "thiseldo" notifications@github.com wrote:

Looking through the code, I cant see anything odd. The setting of the bits to enable boradcast look correct after working through the flow diagram in the datasheet. Cant actually find any informationin the datasheet as to what happens to packets once the head of the circular buffer meets the tail leaving no space left for new packets to be received.

Still looking though.

Andy

— Reply to this email directly or view it on GitHubhttps://github.com/jcw/ethercard/issues/45#issuecomment-8691960.

thiseldo commented 12 years ago

Unfortunately not. Or at least that is my understanding because the only bits being set in EIE register are INTIE and PKTIE. Looks like RXERIE needs to be set for it to register. See line 414 in enc28j60.cpp:

writeOp(ENC28J60_BIT_FIELD_SET, EIE, EIE_INTIE|EIE_PKTIE);

Although looking at the ESTAT register, there is a BUFFER status flag to indicate over/underrun. So this could also be checked.

Looking at the datasheet again, section 12.1.2, checking the RXERIF bit looks like the way to go as you suggest.

Could be a start to try to track down the issues. If problem detected, restart the DHCP process after clearing the buffer.

Andy

thiseldo commented 12 years ago

Change line 414 to:

writeOp(ENC28J60_BIT_FIELD_SET, EIE, EIE_INTIE|EIE_PKTIE|EIE_RXERIE);

then regularly read register EIE

if (readRegByte(EIR) & EIR_RXERIF) { // do something - clear buffer, reset DHCP state etc. }

As an aside, looking at the packetSend function, starting line 430, this actually checks TXERIF flag, but that has not actually been enabled either! Could this be another issue and would the EIE_TXERIE bit need setting above too.

Cheers

Andy

AndrewFischer commented 12 years ago

I added a check for RXERIF, but it isn't getting set. I'll keep looking at the problem....

Line 414 is now

writeOp(ENC28J60_BIT_FIELD_SET, EIE, EIE_RXERIE|EIE_TXERIE|EIE_INTIE|EIE_PKTIE); //AF 21/09/2012

And I added a check to the top of packetReceive

word ENC28J60::packetReceive() { word len = 0; if (readRegByte(EIR) & EIR_RXERIF) { Serial.println("Overflow"); // do something - clear buffer, reset DHCP state etc. } if (readRegByte(EPKTCNT) > 0) { writeReg(ERDPT, gNextPacketPtr);

AndrewFischer commented 12 years ago

The 20 second timeout is too short for our enterprise network. 50 seconds is about right. If dhecp is going to work at all, it will get an address within 50 seconds.

thiseldo commented 12 years ago

Might be an idea just to set the timeout to 60s anyway, should give plenty of margin.

Cheers

Andy

AndrewFischer commented 12 years ago

Good idea. I set the timeout to 60s and pushed dhcp.cpp to my branch. I also added keywords.txt to my branch.

I've done some testing on a different subnet that often fails and I'm seeing Rx buffer overflow errors.

Time to "do something"

AndrewFischer commented 12 years ago

I've run out of time to look at this issue.

I rolled back to a version that will reliably get a dhcp connection. The older library stops working after about six hours. (lease renewal ??) As a work around my sketch does a soft reset every two hours. So far, it has been working fine.

vicatcu commented 11 years ago

AndrewFischer, are you able to try out the pull request from Cheetz? https://github.com/jcw/ethercard/pull/33

Chreetz commented 11 years ago

Hi All,

My code implements rfc2131 except for rebinding. Rebinding could occur in a situation where there is more than one DHCP-server in the network and the DHCP-server from which we got our IP-address isn’t there anymore. The RFC says we should ask another DHCP-server if the lease of our current IP can be extended. My implementation just starts all over again (initiating), which could result in getting another IP-address. That will never be a problem in a home-network, but might be a problem in an enterprise network.

Daisy-chaining switches should be no problem. In fact I tested with a Nanode attached to my laptop with a cross-cable, having my laptop (running Vista) bridging the wifi and physical networks.

I did run into time-out issues. As Andy (thiseldo) stated it would be wise to choose a longer time out. In my situation with a WRT54G/DD-WRT 10secs is allright. The don't beleive that the RFC's specify a scpecific timeout setting.

Hope this helps, Chris.

vworp commented 11 years ago

Using the sketches at https://github.com/openenergymonitor/NanodeRF I'm unable to resolve an IP address using DHCP. It doesn't seem to be switch related, I've connected directly to my router with the same result. Increasing the timeout to 60 seconds does not seem to help.

vicatcu commented 11 years ago

@vworp were you able to resolve an IP address using an earlier version of EtherCard?

vworp commented 11 years ago

Yeah, I've got a copy of ethercard from april(ish) which worked with the april version of https://github.com/openenergymonitor/NanodeRF. The testDHCP example sketch worked fine back then too.

vicatcu commented 11 years ago

@vworp can you try to isolate the difference between the 'working' library and the 'not-working' library? The current library works for me with the examples that come with it...

vworp commented 11 years ago

I'm just done a basic compare in notepad++ between the old and new dhcp.cpp. Looks like there have been a lot of changes to the library (the current Nanode sketch will NOT compile with the old library).

vworp commented 11 years ago

I've uploaded a copy of the old ethercard library to https://github.com/vworp/OldEthercard

vicatcu commented 11 years ago

@vworp please can you try the getDHCPandDNS and testDHCP examples from the latest EtherCard and report back on the results? Also please can you provide us with more details about what specific hardware configuration you are using?

vworp commented 11 years ago

Arduino is a Nanode RF with Uno bootloader, board is 6 months old and proven working fine. Connected to a Netgear DGN1000 router, don't normally have any DHCP problems with this router, as I've said the older iteration of the library worked fine with this hardware.

[testDHCP] MAC: 74:69:69:2D:30:31 Setting up DHCP DHCP failed My IP: 0.0.0.0 Netmask: 0.0.0.0 GW IP: 0.0.0.0 DNS IP: 0.0.0.0

[getDHCPandDNS] DHCP failed My IP: 0.0.0.0 GW IP: 0.0.0.0 DNS IP: 0.0.0.0

vicatcu commented 11 years ago

@vworp can we come back to the statement "(the current Nanode sketch will NOT compile with the old library)." Are you using the latest Arduino environment (e.g. >= 1.0.1)?

vicatcu commented 11 years ago

@thiseldo do you have a through-hole Nanode RF you could try this with?

vworp commented 11 years ago

Yeah, running Arduino 1.01. Current Nanose sketch and old library compile with this error:-

NanodeRF_Power_RTCrelay_GLCDtemp.cpp: In function 'void dhcp_dns()': dhcp_dns:9: error: 'class EtherCard' has no member named 'dhcpValid'

I'm assuming the current nanode sketch has been modified to suit the changes to ethercard.

vicatcu commented 11 years ago

@vworp I've submitted a new pull request that impacts dhcp significantly. You can get it from https://github.com/vicatcu/ethercard... would you mind trying it out?

vworp commented 11 years ago

[getDHCPandDNS] My IP: 192.168.1.8 GW IP: 192.168.1.1 DNS IP: 192.168.1.1

[testDHCP] MAC: 74:69:69:2D:30:31 Setting up DHCP My IP: 192.168.1.8 Netmask: 255.255.255.0 GW IP: 192.168.1.1 DNS IP: 192.168.1.1

They both work fine. Looks promising.

The Nanode sketch wont compile yet:-

NanodeRF_Power_RTCrelay_GLCDtemp.cpp: In function 'void dhcp_dns()': dhcp_dns:9: error: 'class EtherCard' has no member named 'dhcpValid'

I'll take this back to the original authors, see if they want to update the sketch against your fork.

vicatcu commented 11 years ago

@vworp that's great! yes once the pull request is accepted @openenergymonitor / @glynhudson can make the necessary updates to their examples / code base. In the meantime, you can just delete the references to dhcpValid from your examples and it should "just work."

glynhudson commented 11 years ago

Done :-)

Keep up the good work guys.

jcw commented 11 years ago

This is now resolved, right? Let me know if I closed this too soon.