pfalcon / esp-open-sdk

Free and open (as much as possible) integrated SDK for ESP8266/ESP8285 chips
1.97k stars 624 forks source link

Migrate to upstream lwIP + ESP8266 patchset instead of sloppy vendor code-drop #263

Open pfalcon opened 7 years ago

pfalcon commented 7 years ago

https://github.com/kadamski/esp-lwip existed for a long time, but had following issues:

  1. It was unclear what methodology was used to create it.
  2. At some vendor SDK update, it stopped working, and due to p.1, hardly anyone knew or cared to fix it.

That's why I went for partial solution of https://github.com/pfalcon/esp-open-lwip/ , though rebasing onto upstream was always the goal.

I finally went for that, making followed steps:

  1. Cross-referenced vendor history-less codedrop as captured in https://github.com/pfalcon/esp-open-lwip/ against upstream git (tool to be posted).
  2. The "most matching" revision turned out to be STABLE-1_4_0-RC2.
  3. Diff against https://github.com/pfalcon/esp-open-lwip/ was made then, and all changes were carefully categorized in 3 categories: a) useless dirt, to be dropped; b) changes required to just get lwIP run on esp8266; c) changes beyond that. c) turned out to be backported cherry-picks from newer upstream version and some truly magic vendor changes.
  4. When I had a look at upstream + b) changes from above, and compared to https://github.com/kadamski/esp-lwip , they suddenly appeared to be almost matches. The only difference is approaches: I made shy removals to get to the core of changes (took 2 days), while @kadamski threw away as much as possible, then added something with commits like "without this, it crashes" or "without this, pings don't work". I definitely was impressed by the bravery of the approach, as confirmed by my coward approach from the other side, and went to make https://github.com/kadamski/esp-lwip work with SDK 2.0+.
  5. Of course, nothing worked initially, but in few hours, it did.
  6. Then I proceeded to test how well it works, and oops, standard MicroPython smoke test https://github.com/pfalcon/micropython-projs/blob/master/esp8266-tests/test_dl.py wasn't able to download 6MB even once.
  7. I then switched back to my approach, trying to minimize it to the level of https://github.com/kadamski/esp-lwip. That had the same issues with the test.
  8. Then I paid attention that https://github.com/kadamski/esp-lwip lwipopts.h used TCP_MSS 1460 which is huge and known to lead to memory overflow. Just switching it to lwIP's default of 536 didn't help, but I got an idea that there may be further important config changes dug in improperly made lwipopts.h from the vendor codedrop.
  9. Finally, when those config options were extracted and cleaned up, https://github.com/pfalcon/micropython-projs/blob/master/esp8266-tests/test_dl.py showed much better behavior. I still had a case when it stopped, with WiFi TX buffers exhausted. But currently it has downloaded 100MB and counting.

Results of this work:

  1. Repo with my rebasing approach: https://github.com/pfalcon/lwip-esp8266
  2. My fork of https://github.com/kadamski/esp-lwip : https://github.com/pfalcon/esp-lwip . It's already patches and reworked (original commits are updated, squashed, and patched).

The plan remains to use @kadamski's work, but with my extracted patchset around to investigate any issues.

pfalcon commented 7 years ago

@dpgeorge: FYI.

pfalcon commented 7 years ago

I still had a case when it stopped, with WiFi TX buffers exhausted.

That's definitely a noticeable issue:

Free WiFi driver buffers of type:
0: 0 (1,2 TX)
1: 0 (4 Mngmt TX(len: 0x41-0x100))
2: 8 (5 Mngmt TX (len: 0-0x40))
3: 4 (7)
4: 7 (8 RX)
lwIP PCBs:
pfalcon commented 7 years ago

Suspicion lies on packet retransmission issues, e.g. vendor codedrop has pretty magic code like https://github.com/pfalcon/lwip-esp8266/commit/79dc3436960563b201361917917e0f3c18fd65d8

dpgeorge commented 7 years ago

@pfalcon, ack, nice work! It's a shame thought that it's not possible to use lwip "out of the box". Are there plans to upgrade to lwip v2?

pfalcon commented 7 years ago

Are there plans to upgrade to lwip v2?

Well, the idea is that it will make it possible, but I don't have any specific plans to got that far, more realistic idea is to upgrade minor release by minor release (and even that requires better testing process than we have now).

dpgeorge commented 7 years ago

Well, the idea is that it will make it possible, but I don't have any specific plans to got that far,

Ok. lwip v2 had some changes to how netif's are added/initialised so that makes it difficult to upgrade if Espressif have hidden that initialisation code in a binary blob.

kadamski commented 7 years ago

@pfalcon: Great work. I unfortunately never had time to maintain the code I've made. I'm glad you did the work. Our methodology was very similar, I remember spending a lot of time looking at the disassembly of the Esperssif binary blobs too, though :)

LongHairedHacker commented 7 years ago

Hi @pfalcon, I've been playing around with your reworked fork (https://github.com/pfalcon/esp-lwip) and while it compiles fine for me, there seem to be some missing symbols: os_malloc, os_realloc, os_zalloc, os_free. Those are, as far is I can tell, not provided by esp-open-sdk (I'm using esp-open-sdk based on 2.1.0) and kadamski has defines for them: https://github.com/kadamski/esp-lwip/blob/esp8266-1.4.1/config/lwipopts.h#L40

Are they missing intentionally or am I doing something wrong?

pfalcon commented 7 years ago

@LongHairedHacker : I don't have time to look into that issue now (or soon), but feel free to provide exact steps on how you got to that issue so it can be investigated later.

davydnorris commented 7 years ago

Doesn't seem to be any way to create an issue on the /esp-lwip repo directly so I've had to comment here.

Compiled this last night and got some warnings/errors:

pfalcon commented 7 years ago

@kadamski : Thanks for ping-back ;-).

Our methodology was very similar, I remember spending a lot of time looking at the disassembly of the Esperssif binary blobs too, though :)

Well, I've been spending enough time looking at the disassembly too. And if you look e.g. at https://pfalcon.github.io/xtensa-subjects/2.0.0-p20160809/out.html#esf_buf_alloc , you'll see that it allocates different buffer structures than you saw when you looked at that, and there likely were number of changes across SDK evolution. So, for the above particular work I actually didn't look at the disasm, instead following more black-boxish idea that while there're many changes across SDK versions, only some of them would affect lwIP integration. That's still heuristic and I'm not sure whether to continue digging your version, or stick with 1.4.0-rc2 with more or less complete vendor patchset I did initially.

pfalcon commented 7 years ago

Doesn't seem to be any way to create an issue on the /esp-lwip repo directly

Github by default disables issue tracker in forked projects, considering that any communication should happen upstream. I've enabled it now.

Suggest adding #include "mem.h" to lwipopts.h and using os_malloc/os_calloc/osfree in the defines because vPortMalloc has extra parameters that will cause compile errors when the function is properly defined. the os* versions have those parameters taken care of.

There're various conflicts between lwIP headers and SDK headers. I definitely worked that around for it to be able not just compile, but also run, but it's not fully clean/correct, and might have left in my git stash.

error in compiling dhcpserver.c - dhcpserver.h is missing.

I used dhcpserver from SDK2.0 even for kadamski's upgraded repo. Bits and pieces might have left in git stash too. Generally, the situation with dhcpserver isn't clear - it's unclear what it's licensing is, etc. I'd have an idea to replace Espressif's proprietary version e.g. with one from esp-open-rtos, but that's written with threaded style, so needs conversion to native lwIP API, and then long testing for any possible regressions comparing to proprietary server.

But the current situation is, to remind, that version upgraded from kadamski's work has obvious regressions. So, I'm not sure which version to work with, there're 2 ways:

  1. Take kadamski's version and fix regression.
  2. Take vendor-based version and keep removing patches while checking regressions on each step.

Both ways are long and boring, but only no.2 is sustainable.

pfalcon commented 7 years ago

To xref other work:

  1. @d-a-v's work on "non-OS" SDK upgrade to lwIP 2.0: https://github.com/kadamski/esp-lwip/issues/8#issuecomment-304589345
  2. esp-open-rtos work to upgrade FreeRTOS-based env to lwIP 2.0: https://github.com/SuperHouse/esp-open-rtos/pull/389
davydnorris commented 7 years ago

What i was saying is that if you clone the repo and do the make like in your instructions it breaks. You get warnings (interpreted as errors) about implicit declarations for vPortMalloc etc. and you get an error about not being able to find dhcpserver.h.

If you fix the implicit declaration error by adding mem.h before your define, you then get errors about too few parameters. These are fixed if you use os_malloc instead of vPortMalloc because that expands and supplies values for the extra parameters, and the code is still the same.

If you go looking to fix the missing include, it is there in the tree but is not in the right place. Either it needs to be moved, or the reference in dhcpserver.c needs to be changed to suit.

If you are not expecting the current repo to build then that's fine, but right now it won't complete the build.

The one other thing I did in the makefile was to make it suitable for a non-standalone toolchain by putting in $(SDK_BASE) and pointing to my SDK. I figure it's easier to play with a modified libmain that way while keeping your toolchain operational

davydnorris commented 7 years ago

@pfalcon have you seen the latest lwIP related commits in ESP_NONOS_SDK from Espressif? Does this start to make life easier for us?

BTW I am trying to get some code out right now but as soon as I am done I'll be jumping into the lwIP2 stuff d-a-v has been working on - this is really good.

pfalcon commented 7 years ago

@davydnorris: I didn't, I'm still in "reduced performance" mode. If you have any specific links, those would be welcome.

davydnorris commented 7 years ago

https://github.com/espressif/ESP8266_NONOS_SDK/commit/cf83aba8a5f0a44ce39acb8642904f1d52ce876b

I don't ever remember lwIP source being a part of the SDK, and now it's also in a third party folder of its own with separate build instructions. Is this just what we already know or does this give us some more hints?

c0d3z3r0 commented 5 years ago

There is https://github.com/espressif/esp-lwip now with lwip 2.1.2. I am wondering what I should use ......

d-a-v commented 5 years ago

Maybe you should try @someburner 's fork https://github.com/someburner/esp-open-sdk/tree/sdk2-lwip212 and report here ?

note: lwIP-2.x is not usable with nonos-sdk firmware (any version) without an adaptation layer so espressif lwIP-2.x is not for us (but for esp8266-rtos-sdk or more generally their IDF subsystem)