nodemcu / nodemcu-firmware

Lua based interactive firmware for ESP8266, ESP8285 and ESP32
https://nodemcu.readthedocs.io
MIT License
7.64k stars 3.12k forks source link

RTOS-SDK, ESP32 and the way forward #1319

Closed jmattsson closed 7 years ago

jmattsson commented 8 years ago

Edit: the below progress update refers to the dev-rtos branch which targeted the RTOS-SDK and the ESP31B. With the final release of the ESP32, Espressif abandoned the RTOS SDK in favour of their new IoT Development Framework (IDF). While the IDF is vastly superior to the previous SDKs, it does set our porting effort back a fair bit. Progress updates on the IDF/ESP32 port of NodeMCU can be found further down in this discussion.

With the ESP32 release coming up in a few months time, it's time to seriously start thinking about the way forward. I think it's a given that we'd all like to see NodeMCU run on the ESP32 as well. With the ESP32 there is only the RTOS SDK however, which means we really need to consider how to get ourselves switched from the non-OS SDK over to RTOS.

Since $work is rather interested in shifting some of our products over to the ESP32 I've had a bit of time to investigate the effort that will be required in terms of NodeMCU. I've been "spiking" over on the DiUS dev-rtos branch to see what I can get going. Here's the overview so far:

If we can get our current NodeMCU to run stable on the ESP8266 with the RTOS SDK, it should be quite easy to get ESP32 support in I believe. If/when I get my hands on ESP32 hardware, I'll have an even better idea.

Oh, and the dev-rtos branch is subject to force-pushing and other unpleasant things, and it is most assuredly not ready for public consumption, but if you want to track my progress you'll see it there.

TerryE commented 8 years ago

@jmattsson I will up the priority on the net over LwIP. I really wanted to go that way anyway, but this was the push that I needed.

About to hop on a Ferry and travelling for the rest of the day, but will post back at the weekend. :smile:

devsaurus commented 8 years ago

Took a fair chunk of work to find a good way to hi-jack the UserExceptionVector, but on the upside it's now also a whole lot faster than the previous one.

Is this improvement also applicable to our current firmware?

jmattsson commented 8 years ago

It could probably be moved over to the regular SDK with some care, but I haven't looked at that.

jmattsson commented 8 years ago

Okay, I'm at a Lua prompt now!

NodeMCU 1.5.1 build unspecified powered by Lua 5.1.4 on RTOS-SDK 1.4.0(c599790)
lua: cannot open init.lua
> 
=node.heap()
39240

The printf doesn't seem to support tabs(?!), because everything seems to end up on its own line, but that's for later... I seem to be running into the auto-bauder leaving things at 74880 until I bang characters at it, even when I've initialized the baudrate to 115200. I don't know what's up with that; @pjsg maybe you have some idea?

Is this the point where I suggest we get this branch into the official repo and let everyone loose on it to try to bang it into shape? After we make sure it's hidden in the cloud builder, of course @marcelstoer .

I'm now eagerly looking forward to getting my hands on the ESP32 dev board from @nodemcu! :)

marcelstoer commented 8 years ago

Hurray Johny, sounds exciting!

Is this the point where I suggest we get this branch into the official repo and let everyone loose on it to try to bang it into shape?

Definitely! However, I suggest to track of the challenges you encounter in a separate issue on GitHub. Otherwise, this one will soon become confusing and hard to follow. It would help if we could define new labels ourselves @nodemcu so we could create RTOS SDK, non-OS SDK or ESP32 to distinguish between the issues.

After we make sure it's hidden in the cloud builder, of course @marcelstoer .

I used to pull active branches from GiHub API (really cool API) and then blacklisted some. However, a few months ago I switched to statically define master and dev only. So, no more maintenance overhead and no more fear of things potentially falling over with every new branch.

pjsg commented 8 years ago

On 31/05/2016 00:12, Johny Mattsson wrote:

The |printf| doesn't seem to support tabs(?!), because everything seems to end up on its own line, but that's for later... I seem to be running into the auto-bauder leaving things at 74880 until I bang characters at it, even when I've initialized the baudrate to 115200. I don't know what's up with that; @pjsg https://github.com/pjsg maybe you have some idea?

That sounds very odd. I'll take a look if you tell me your branch name. The autobaud code does not (AFAIK) touch the uart configuration until it detects the baud rate in use....

marcelstoer commented 8 years ago

Philip it's https://github.com/DiUS/nodemcu-firmware/tree/dev-rtos

devsaurus commented 8 years ago

Almost all the various c_ prefixed functions have been consolidated back to standard C library names.

Changing c_puts() to puts() brought along two regressions. Functionality-wise, output redirection is not supported any more with node.output(). c_puts() routed the string through output_redirect() in node.c which triggers an optional callback. On the formatting side it generates additional newlines - libc's puts() unconditionally appends a newline char to the string which c_puts() / output_redirect() didn't do.

This is currently a show stopper for my dev environment with ESPlorer (and potentially other similar tools).

jmattsson commented 8 years ago

Thanks for diagnosing that!

puts() adding a newline is standards compliant, so if we're assuming it doesn't then that's our bug(s). Looking at the output_redirect() I see that it never captured characters printed via putc(), so that redirect was never complete in the first place :(

The old-style redirect we were using won't be possible with the pre-empting RTOS, as we'd at "best" end up running Lua code reentrantly from whichever RTOS tasks are printing. I suspect what we'll need is to install a proper handler via os_install_putc() and queue characters over into the Lua task. Shouldn't be too bad to do - we need to install one of those anyway to be able to implement the system_set_os_print() function anyway.

jmattsson commented 8 years ago

@marcelstoer Thanks, I hadn't realised you switched from black-list to white-list!

I've pushed a dev-rtos branch into the main repo now, and will continue work against that. I consider this a free-for-all branch for the time being - there is so much to test and fix and polish that I'd prefer to risk stepping on toes (and having toes stepped on) than using the PR mutex. I'm contemplating starting a wiki page to document what I learn as a I go - would that be sufficiently useful to be worth the effort?

@devsaurus I believe I've taken care of the newline madness now. I've yet to address the output redirect.

devsaurus commented 8 years ago

Thanks a bunch, Johny! That'll allow me to have some test drive with the new rtos flavour. Output redirection is not that important atm, as long as there's serial comm :smiley:

As a side note: I found that you need to manually download and extract ESP8266_RTOS_SDK_v1.4.0_16_02_28 (presumably) to rtos-sdk. Also update init data from rtos-sdk/bin/esp_init_data_default.bin - just to be sure.

jmattsson commented 8 years ago

The RTOS SDK is a git submodule so a simple git submodule update --init should do the trick.

devsaurus commented 8 years ago

node.compile() crashes 100% on an integer build with u8g enabled on top of defaults (doesn't happen for float):

> node.compile("telnet.lua")
Fatal exception (28): 
epc1=0x401008a2
epc2=0x00000000
epc3=0x40105adb
epcvaddr=0x4024e717
depc=0x00000000
rtn_add=0x4010089b
<repeating dump>

Also dofile() is broken. I even got a meaningful error for a small script saying:

"�M�?"(stack_size = 0,task handle = 3fff3bf0) overflow the heap_size.
"�M�?"(stack_size = 0,task handle = 3fff3bf0) overflow the heap_size.
 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x40100000, len 26856, room 16 
tail 8
chksum 0xfe
load 0x3ffe8000, len 2392, room 0 
tail 8
chksum 0xd4
load 0x3ffe8958, len 8, room 0 
tail 8
chksum 0x5c
csum 0x5c
jmattsson commented 8 years ago

Both of those look like they're due to stack overflow (again). This seems like it's going to be our biggest pain point in this transition - we have stuff that uses way more than the 2k heap RTOS wants to let us use. If we say each function call uses ~100 bytes of stack on average, the 2k stack limit should still allow us to go roughly 20 functions deep. I suspect something is placing too large arrays or objects on the stack, in a frequently used code path. Help tracking this down would be much appreciated.

Edit: you can increase the nodemcu RTOS task stack size in user_main.c, but that obviously directly impacts free heap, and I don't yet know why the SDK docs state that 2k is the upper limit since we're already running past that.

jmattsson commented 8 years ago

@devsaurus Output redirection should now work again.

jmattsson commented 8 years ago

I increased the nodemcu task stack size to 6k, see if that helps for now?

TerryE commented 8 years ago

I've been pondering this and the issue that we have with Lua is that the execution engine is intrinsically single threaded. Yes, coroutining is supported but this is cooperative and non-preemptive. Yes, on a real OS you can have multiple Lua environments running but these can't interact except though OS mechanisms, and you just don't want to go there on a ESP-class processor.

All of this wasn't an issue with the non-OS SDK since this was also non-preemptive (at least in terms of non-ISR code), but as Johny has pointed out callbacks in RTOS are (or can be) invoked asynchronously in a separate C stack space, and there are a legion of bears traps here.

I think that we should be thinking about extending our model for ISRs and adopt an asymmetric Lua-land / other-land approach. We can' use a symmetric mutex approach because of the Lua process's heavy stack use; We can't (at least on the ESP8266) allow multiple tasks to demand a large stack. I feel that we should think in terms of a 1+N structure where the Lua task is "special" and that all callback tasks do a task post which queues a request to the Lua task when they need to xfer control into Lua-land.

Anyway just musings whilst I build my house. :smiley:

devsaurus commented 8 years ago

I increased the nodemcu task stack size to 6k, see if that helps for now?

Of course, node.compile() works ok now and I didn't hit any other issues so far.

While compiling the fw up and down I got the impression that the Makefile processing is a bit odd. The rtos-sdk/third_party/lwip tree is traversed for each make invocation of a SUBDIR. That's a cosmetic issue in the first place, but also cleaning rtos-sdk/third_party/lwip doesn't work. It caused me some headaches when upgrading my local tool chain.

jmattsson commented 8 years ago

@devsaurus Thanks, I've cleaned up the Makefile now. The clean & clobber targets have also been hooked up to the lwip in rtos-sdk. I appreciate the testing! Scratch that - we shouldn't even be rebuilding the SDK lwIP, that was just a left-over from when I was initially getting things to link.

@TerryE I agree 100%. Thinking of callbacks as being executed in interrupt context is the best (if not only) workable approach. I was briefly entertaining the notion of implementing the lua_lock() function, but considering the stack usage of the Lua VM it is not feasible to execute callbacks from other RTOS tasks, even if we got the VM locked properly (which I suspect would likely have caused some tasks to be blocked for too long while waiting for the VM to become available).

jmattsson commented 8 years ago

I've started a wiki page to track overall progress. Feel free to expand it.

devsaurus commented 8 years ago

I've cleaned up the Makefile now.

Thanks for that - I didn't get the intention with lwip in the first place, but it's definitely calmer now. Though I do want to clean lwip for the time being in order to recompile and check stack usage there as well :wink:

In this respect, please find a ((very) clunky) approach to obtain info from -fstack-usage over at my esp-open-sdk fork. It generates the desired *.su files and they don't look too silly, but I haven't cross checked the results yet with the generated code. It's a start and might need further tweaking - hope it's useful in the end.

jmattsson commented 8 years ago

Ooooh! Building new toolchain now!

jmattsson commented 8 years ago

@devsaurus Not looking quite right yet:

(gdb) disass nodemcu_main 
Dump of assembler code for function nodemcu_main:
   0x40253d38 <+0>:     addi    a1, a1, -16
   0x40253d3b <+3>:     s32i.n  a0, a1, 12
   0x40253d3d <+5>:     call0   0x40253cc0 <nodemcu_init>
   0x40253d40 <+8>:     call0   0x40254fbc <task_pump_messages>
End of assembler dump.

whereas the .su says:

user_main.c:123:13:nodemcu_main 8   static

The stack pointer must always be 16-byte aligned, btw (ref Xtensa ISA reference, p587), and since the return address almost always needs to be stored I'd expect every function to at least have 16bytes of stack usage.

Edit: rounding up to the nearest 16byte boundary makes it look pretty good though, so this should be fairly representative:

$ find app|grep .su$|xargs cat | sort -rnk2 |head -n 40
coap_server.c:7:8:coap_server_respond   476     static
lparser.c:411:8:luaY_parser     376     static
lparser.c:611:13:body   312     static
wifi_common.c:8:6:wifi_add_sprintf_field        304     static
encoder.c:37:15:fromBase64      296     static
enduser_setup.c:1002:14:enduser_setup_http_recvcb       280     dynamic
coap_client.c:10:6:coap_client_response_handler 240     static
wifi.c:1168:12:wifi_ap_dhcp_config      192     static
httpclient.c:441:24:http_request        184     static
wifi.c:607:12:wifi_station_config       152     static
ltable.c:527:16:newkey  152     static
lstrlib.c:492:12:str_find_aux   152     static
enduser_setup.c:928:13:on_scan_done     152     static
enduser_setup.c:1459:12:enduser_setup_start     152     static
wifi.c:548:12:wifi_station_getconfig    148     static
lstrlib.c:545:12:gmatch_aux     136     static
ldblib.c:334:12:db_errorfb      136     static
coap.c:314:12:coap_request      136     static
crypto.c:23:12:crypto_sha1      128     static
wifi.c:86:13:wifi_scan_done     120     static
mdns.c:27:12:mdns_register      120     dynamic
loadlib.c:564:12:ll_module      120     static
ldo.c:184:6:luaD_callhook       120     static
ldebug.c:734:6:luaG_runerror    120     static
wifi.c:1005:12:wifi_ap_getconfig        116     static
lbaselib.c:548:12:costatus      116     static
lauxlib.c:84:17:luaL_where      116     static
lbaselib.c:125:13:getfunc       112     static
wifi.c:1018:12:wifi_ap_config   108     static
node.c:593:12:node_stripdebug   108     static
lauxlib.c:54:16:luaL_argerror   108     static
spiffs_nucleus.c:891:7:spiffs_object_append     104     static
spiffs_nucleus.c:688:7:spiffs_object_create     104     static
spiffs_nucleus.c:1132:7:spiffs_object_modify    104     static
spiffs_gc.c:376:7:spiffs_gc_clean       104     static
spiffs_gc.c:234:7:spiffs_gc_find_candidate      104     static
spiffs_check.c:830:7:spiffs_page_consistency_check      104     static
lparser.c:1362:13:chunk 104     static
encoder.c:14:15:toBase64        104     static
wifi.c:377:12:wifi_getmac       100     static

Further edit: And now I see that we're missing the -fcallgraph-info option as well. That's unfortunate, the merger of callgraph and stack usage is what would give us the proper targeting information for reducing stack usage.

devsaurus commented 8 years ago

I'd expect every function to at least have 16bytes of stack usage.

That's very valuable input for a sanity check, thanks. I'll investigate later this day to find out how to report gross stack usage.

TerryE commented 8 years ago

@jmattsson Johny, apart fromthe performacne and code size implications, stay well away from lua_lock() for other reasons:

devsaurus commented 8 years ago

Looks better now:

user_main.c:79:6:nodemcu_init   16      static

Although it's still more guesswork than code-fu :blush:

Seems that the 16 bytes penalty is still not considered, but the change fixed a severe miscalculation of the net stack use. The leaderboard changed significantly...

You will now also get *.ci files via -fcallgraph-info :wink:

coap.c:232:13:coap_response_handler     1424    static
lstrlib.c:756:12:str_format     1264    static
lstrlib.c:642:12:str_gsub       1216    static
coap.c:33:13:coap_received      1184    static
struct.c:212:12:b_pack  1120    static
mqtt.c:251:13:mqtt_socket_received      1120    static
mqtt.c:1427:12:mqtt_socket_publish      1120    static
spi.c:68:12:spi_send_recv       1104    static
spi.c:175:12:spi_recv   1088    static
node.c:336:6:output_redirect    1088    static
mqtt.c:1312:12:mqtt_socket_subscribe    1088    static
mqtt.c:1200:12:mqtt_socket_unsubscribe  1088    static
ltablib.c:144:12:tconcat        1088    static
lstrlib.c:122:12:str_char       1088    static
liolib.c:343:12:read_line       1088    static
lauxlib.c:388:24:luaL_gsub      1088    static
file.c:191:12:file_g_read       1088    static
liolib.c:371:12:read_chars      1072    static
lauxlib.c:689:16:luaL_loadfsfile        1072    static
i2c.c:122:12:i2c_read   1072    static
ow.c:194:12:ow_search   1056    static
ow.c:129:12:ow_read_bytes       1056    static
mqtt.c:565:6:mqtt_socket_timer  1056    static
lstrlib.c:90:12:str_rep 1056    static
lstrlib.c:78:12:str_upper       1056    static
lstrlib.c:65:12:str_lower       1056    static
lstrlib.c:54:12:str_reverse     1056    static
lstrlib.c:144:12:str_dump       1056    static
mqtt.c:527:13:mqtt_socket_connected     1040    static
coap_server.c:7:8:coap_server_respond   496     static
lparser.c:411:8:luaY_parser     384     static
lparser.c:611:13:body   336     static
wifi_common.c:8:6:wifi_add_sprintf_field        320     static
encoder.c:37:15:fromBase64      320     static
enduser_setup.c:1002:14:enduser_setup_http_recvcb       304     dynamic
coap_client.c:10:6:coap_client_response_handler 256     static
wifi.c:1168:12:wifi_ap_dhcp_config      208     static
httpclient.c:441:24:http_request        208     static
wifi.c:607:12:wifi_station_config       176     static
jmattsson commented 8 years ago

That's a lot of luaL_Buffer instances on the stack there, each with a 1k buffer (via LUAL_BUFFERSIZE -> BUFSIZ -> stdio.h).

TerryE commented 8 years ago

Reimplement the net module (and others) on top of lwIP API, since espconn is only partially supported on the ESP8266 RTOS, and not at all on the ESP32 RTOS. Probably look at including mbedTLS for TLS support.

@jmattsson Johnny, I've just been going through a review of what I'd need to do to the net library to port it in a nonOS SDK + ESP8266 / ESP32 RTOS SDK way, and to that end I've been comparing the documentation for the two RTOS APIs. In short, the ESP32 has a few additions:

It also has some big omissions:

OK, the additions partially reflect new H/W capability, but what isn't clear to me is that any specific omission is a permanent removal or simple a temporary omission because the ESP32 SDK is still in beta and that Espressif will add this back before a V1 production release. I really don't want to spend a lot of effort effectively reimplementing something that gets added back before I am done. I'll have a trawl around the ESP32 forum to see what I can find here, for example: Network Espconn APIs.

jmattsson commented 8 years ago

There's a fair bit more undocumented support on the ESP32, especially in the RTC co-processor area (can't wait to find out how to build code to run that core too!). It's not yet clear which model(s) Espressif will support with the two main cores. I've seen references to both running the entire chip as an SMP RTOS system, but also a "split" version where the WiFi stack runs on the "pro" core and the application has the "app" core to itself. Currently there is only support for running a single-core shared RTOS on the "pro" core.

Regarding omissions, yes, there are certainly some, but they were honestly surprisingly few to me. I don't know if you've been keeping an eye on the dev-rtos branch, but I'm at the point where that branch can now build and link for the ESP32 as well (can't/haven't run it yet though; my ESP32 seems to have trouble with its flash chip - we've ordered replacements for next week to try a swap). The things I ended up ifdef'ing out for now were only espconn, RTC, RF modes and some bus drivers (SPI, I2C) due to hardware differences.

I didn't have to change any of the hardware timer (FRC) stuff, other than provide suitable compatibility macros. Not sure which timer API you are referring to here? Also worth noting that in RTOS the os_timers are very high priority and might be sufficient for some things we've hooked the hardware timer for.

In terms of upgrade APIs, I'm still not sold on the Espressif way. We already have two competing, working implementations for the non-OS SDK version. I'd rather try to make those two compatible with each other and port that over to the ESP32.

For meshing, I'm guessing this will come later, just as it did for the 8266. Considering we haven't used it yet, this should be fine. Besides, the whole ESP-NOW protocol needs to evolve and stabilise a bit further first imo - it's got some serious drawbacks which makes any real life deployment challenging.

With the espconn bugbear I did see that mention of a compat layer for ESP32 RTOS, but never for the secure version (there is none available even for ESP8266 RTOS). The TLS library also looks very different. Even if they were to provide both espconns, we'd still have the issue that it's not possible to shut down a TCP server safely. Also, the lwIP native API is a much better fit in the RTOS model, since pbuf ownership is well defined and would allow us to easily and safely transfer it between tasks, giving us lower memory usage. I really don't think any time you spend on transitioning us to native lwIP is going to be wasted, Terry. At worst I could see feature parity, but far better stability and code quality in your implementation.

TerryE commented 8 years ago

I really don't think any time you spend on transitioning us to native lwIP is going to be wasted, Terry. At worst I could see feature parity, but far better stability and code quality in your implementation.

OK, but I think that a sensible compromise for now is to get an LwIP non-OS SDK-based net module working. as we will need this what ever we do.

TerryE commented 8 years ago

That's a lot of luaL_Buffer instances on the stack there, each with a 1k buffer (via LUAL_BUFFERSIZE -> BUFSIZ -> stdio.h).

Picking up this discussion:

So out of this, the one suggestion that I think that we could consider is breaking the LUAL_BUFFERSIZE -> BUFSIZ association.

jmattsson commented 8 years ago

If you've been following the ESP32 news, you might be aware the we got a WIP pre-release SDK drop a couple of weeks ago. I've finally had a bit of time to sit down and go through it. A lot of things have changed, and there's a lot more source available (yaaaaay!). They've even changed the terminology away from "SDK" over to "IDF" (IoT Development Framework) instead.

It does however present a challenge in terms of supporting both the ESP8266 and the ESP32 in the one NodeMCU branch. At this point I have no knowledge whether there will be a similar "IDF" becoming available for the ESP8266. If there is, we'll need to adopt that. If there isn't, I'll need to see whether it's feasible to massage the ESP32 IDF into the NodeMCU build structure somehow. It would probably still need NodeMCU to be based on the RTOS-SDK in such circumstances.

Interesting times...

jmattsson commented 8 years ago

So today, totally unexpected, I got a parcel at work. With an extremely well-packaged ESP32 devboard inside! ./squeee!

Thanks @nodemcu, I assume that was your doing! :)

I'll continue the NodeMCU porting effort now, but in some ways it's like starting over from the beginning considering the massive changes to the build environment that the IDF introduced. Time to read up on Kconfig and see if I can come up with a clever approach to use the IDF framework to build NodeMCU for the ESP32 while using our existing approach for the ESP8266(RTOS)...

So far I've updated the pre-compiled toolchain, so if you're using the dev-rtos branch you can use the tools/toolchain/esp32/bin toolchain with the IDF environment.

jmattsson commented 8 years ago

Tentatively, I'm thinking that going IDF all-out is the best way forward. Kconfig is soooo much nicer than user_config.h/user_modules.h and it could easily take over both those roles. Sure, there may be some grunt-work to get the ESP8266 RTOS-SDK compatible (unless Espressif comes through soon on that front), but I think it would be worth it. Thoughts?

jmattsson commented 7 years ago

I need to sit down and do a proper write-up on all the stuff I've learned, but here's a visual update.

luismfonseca commented 7 years ago

Hmm! What a tasty amount of free heap it has!

pjsg commented 7 years ago

The downside of that much heap is that our current approach to allocating it doesn't work any more (the GC will take much longer). The upside is that we can avoid running the GC so often!

jmattsson commented 7 years ago

Speaking of pros and cons, there is good news and the bad news. The good news is that the new Espressif IoT Development Framework (IDF) is really nice to work with. It feels flexible, powerful, and polished (with the occasional unfinished spot). While I've certainly got a soft spot for the cozy hackiness of the 8266 SDK, the IDF is playing in a different league, and I'm really liking what they've done here.

The bad news is that it is so different we will need a dedicated ESP32 branch, at least for the foreseeable future. Back with the RTOS SDKs I think we could've managed a single branch, but even trying to get the ESP8266 RTOS SDK to work together with the ESP32 IDF is something I see as unsurmountable given our resources, let alone the non-OS SDK with the IDF.

I tried taking the dev-rtos branch and "IDF-ifying" it, but honestly, that's just not going to work. Getting NodeMCU onto the ESP32 is going to be a case of carefully lifting each module from the dev branch, tweaking it to fit in with the IDF arrangement of headers and libs, and of course making it RTOS-aware and safe. This is in many ways a lot less satisfying than cutting across large swathes of code which "mostly works", but on the other hand I think it will result in better code quality, quicker.

While I thought we had a pretty good platform abstraction layer, the ESP32 is such a significant upgrade and departure from the ESP8266 way of doing things that it will need a lot of revisting. The way I see this playing out is that we'll run both dev branches in parallel for some time. As soon as the ESP32 branch is in any sort of reasonable shape (I'm still tidying stuff up over in the DiUS repo before I push it across to here), we'll have to do extra work whenever we're merging in things to ensure it gets applied to both branches to reduce divergence. The sheer amount of stuff that happened on dev compared to dev-rtos made it infeasible to rebase or merge without a whopping big effort, and I don't think we could cope with that if we let it happen this time around.

In the end, I see three or four possible/likely paths:

  1. We keep the ESP8266 and ESP32 branches separate indefinitely, and leave the 8266 on the non-OS SDK. Less work in the short term, but it precludes actual code sharing between the branches for the most part and is thus very costly in terms of effort over the long term.
  2. The ESP8266 branch gets moved to the RTOS SDK, which would enable code sharing between the branches for certain things, hopefully the majority of modules, subject to some #ifdef'ing. This is the middle path, with more up-front effort but only moderate on-going overhead.
  3. Espressif releases an IDF for the 8266. This would be the ideal option, as it would allow us to merge the branches. Switching the 8266 over to the IDF would take some work, but could be done by adding in ESP8266-specific features on the already-IDF'd ESP32 branch. I'd expect it to be a similar amount of up-front work as the previous option, but a whole lot less on-going. This is the option I'm hoping for, to be honest.
  4. The NodeMCU team decides that it's too much effort to ever support the ESP32, and discontinues the porting effort. Definitely not my preferred option, but something that might happen if we don't find the time/people to cope with it all.

Unless we go with the last option, we will need every module maintainer to pitch in with getting all the modules ESP32 ready (excluding stuff which is pure 8266, naturally). Over the next couple of days I'm hoping to document as much as possible of the changes needed. Some is already at the top of this thread, but there's much more I'd like to share to help form a common view of what's necessary. My plan is to put it in-repo in the docs so it's easily findable for any developer who comes in later too. I'll also keep tidying up what I've done so far, and shunt it across into the official repo. Of course, the difficulty in getting hardware will not help with getting this flying, but that pain will pass.

While the above might sound a bit gloomy, I reckon we can make it happen, and I really think we want to make it happen - the ESP32 is one nice chip! It really feels like Espressif took the list of shortcomings/annoyances of the 8266 and just fixed them all. Properly. And the documentation quality is really good (just waiting on the quantity now!)

luismfonseca commented 7 years ago

@jmattsson For completion sake, there's also the 5th option:

The NodeMCU team decides that it's too much effort to support both ESP8266 and ESP32, and discontinues the ESP8266. (Obviously also a sad option).

Anyway, I know that I've limited experience in working for NodeMCU but I'm going to help in any effort required towards having NodeMCU on ESP32. :)

mikewen commented 7 years ago

My first reaction was NO, keep support ESP8266 is much more important! To be honest, I am disappointed with ESP32. (Offer very little what I need.)

Second thought, maybe, current NodeMCU is stable enough with many features. Why not only focus on ESP32? But my concern is ESP32 does not have enough interest as when ESP8266 was launched. We can run a simple poll.

TerryE commented 7 years ago

Johny, given that the ESP32 part seems to be shipping for ~ $5 and this will probably fall, I can't see the ESP8266 lasting long. At best it will be shipped as a low cost "sustain" component to support existing production uses. Speaking purely personally, the $1-2 price point isn't important for me. The complete WiFi integrated SoC module is, and the ESP32 seems to address all of the annoyances and constraints of the 8266, so I would personally vote for a switch to the 32 for future development.

The RTOS / IDF stack of the 32 vs. the non-OS SDK of the 8266 will make it very difficult to maintain a common code base going forward, I think so we've got some hard calls to make ahead.

PS. the stone skin of my new house will be finished in a few weeks and we are waiting for the plastering team to board out, so the silly 6-7 days a week should slacken off soon. I am suffering Lua withdrawal symptoms, so its getting time to get back on-board and catch up, I think :smile:

mikewen commented 7 years ago

I doubt ESP32 will be as popular as ESP8266. And ESP32 is still at least 1-2 year away from mass production ready.

This thread only have about 5 people, since May, that says something.

jmattsson commented 7 years ago

As some of you might've noticed, I've just established the dev-esp32 branch in this repo. It's the "cutting edge" of ESP32 support, and by cutting I really mean I've cut out the 8266 support completely, as mentioned above. I've been building the ESP32 NodeMCU from the ground up, and I believe I've now dealt with all the FIXMEs I had in there, and most of the TODOs. In short, I think it's at the point where others could realistically start helping with the porting effort.

As of writing, that branch has got the console UART functioning (except auto-baud), and I've just finished getting SPIFFS to work today. NodeMCU now uses an explicit partition for the filesystem, rather than magically deducing free space and dropping the fs there (an approach which wasn't a good mix with partitions!). I'm sure @pjsg could polish it further though, and over time we'll need to consider support for embedding a readonly fs within the app, but that's for later.

The next things on my list are to grab some more of the node module functionality, and some of the basic WiFi functions. Once the WiFi is up I'll grab the native-lwip net module across from the PR (or dev if it's made it in). I'm planning on treating this branch as a "cowboy" branch for a while yet, but if ya'all disagree and want to start seeing PRs sooner rather than later I can do that too. There might be a lot of those in that case however.

Of the code in the esp32 branch, the one feature I can't yet enable is the FATFS option in the build since I haven't got the sdcard/spi support ported yet. Someone else is most welcome to look at that.

I've started making developer notes in the extension dev FAQ but it's rather light-weight so far. Somewhere I guess I should document the following:

git checkout dev-esp32
git submodule update --init
make menuconfig
make
make flash

which is the TL;DR for this branch. There may need to be something about PATH=$PATH:$PWD/tools/toolchains/esp32/bin, but if so that should get baked into the Makefile really.

@luismfonseca Other than porting modules over from the latest dev, one thing that would be useful is feedback/help with the porting notes. I'm kind of hoping it will grow as people start porting modules and notice shortcomings in said notes. There is also some older stuff on the wiki, of which some should go into the extension dev FAQ, some discarded as it's no longer relevant, and possibly the progress table redone.

Oh, and of course, testing the functionality that has been ported so far is always welcome, but I appreciate ESP32s are still a bit rare.

@TerryE Yeah I expect the 8266 will hang around for years to come, but for new designs the 32 is certainly a tempting option. It's not that long ago where the 8266 was priced in a similar range, and the 32 is a ridiculously powerful chip for the price-point! Good to hear the house building is progressing well; looking forward to having you back on board and butting heads with me over technical details :)

@mikewen Years away from production ready? Nah, chip production is already (finally!) rolling and module production is ramping up. By xmas I imagine dev boards will be freely available. And back when the ESP8266 was launched, few took notice about it. It took quite a while for it to really break into the hacker/maker community largely due to lack of docs and tools. Espresssif is really working with the community this time, and I expect the 32 will get an overall quicker uptake, tbh.

@devsaurus I've been hitting some stack overflows in the Lua thread even with a significantly larger stack, so if you're feeling adventurous you could look at getting the -fstack-usage patches into the xtensa-esp32-elf toolchain as well :) I'd be happy to include some patches in the prebuilt toolchains that are submodule'd into the repo.

TerryE commented 7 years ago

@jmattsson Johny, I've been brooding about the Lua architecture. As you know NodeMCU is build on eLua which was built and tested on the assumption that the Lua interpreter is non-reentrant, and hence the VM can only execute a single Lua thread and any multi-threading must be cooperative. IMO, we should stick with this on the ESP32 because moving to a thread-enabled VM is gong to require a lot of regression testing and fixing some subtle and unknown dependencies on the single thread assumption in the RTS libraries.

Given the asymmetric nature of two processors (though did I notice references to making the latest RTOS versions SMP?) I don't see this as a major impediment. However we need to have some clear guidelines for library writers interacting with Lua-land.

More thought needed :)

jmattsson commented 7 years ago

@TerryE It's almost fully SMP now. There are a handful of things which can only be done from one core or the other, but pinning a driver task/thread to that particular core is trivial (if we ever need to use those things - off the top of my head I can't remember what they area).

And yes yes yes we're sticking with a single-threaded LVM thank-you-very-much! I'm not debugging the monstrosity that would otherwise appear! :D I already ported the NodeMCU task API when I did the original RTOS work, so that side is covered. Getting everyone to remember to post/queue things from within SDK callbacks rather than calling directly into the LVM will be the challenging part. I've tried to cover this the dev docs I've been writing so far, but it's certainly a point that bears hammering in.

I wonder if I could convince Espressif that their APIs should take a "results-posted to-this-task-please" approach over the current direct callback way...

igrr commented 7 years ago

It will rather be "Results posted to this queue please". We are, indeed, going to move away from callbacks. So I apologize for breakage caused before we reach 1.0.

TerryE commented 7 years ago

@jmattsson Johnny, Rereading this whole thread I realise that I must seem that I am going senile because I keep repeating myself. :laughing: The problem is one of bandwidth: I need to allocate ~30min every day just to keep up with what is happening on the list, and I just haven't had that time so have got out sync. I need to do some deep reading to catch up, and avoid stating the already stated. Sorry. I'll drop you an email separately.

jmattsson commented 7 years ago

@igrr That's great news, Ivan! Thanks for letting us know. That approach will certainly make everyone's life easier. Any rough idea on time frames for this to start appearing in the IDF? And which areas might get it first? I'm just trying to get an idea on how I might best plan my work.

@TerryE Hahaha, you're excused! I remember how hazy I was back when I was doing the multi-year reno/build for my house, so I'm not going to judge.

igrr commented 7 years ago

This change needs to land before 1.0 release, which should happen around Oct 1st. If you have some proposals about the way you would like to see this API, i encourage you to open an issue at https://github.com/espressif/esp-idf. I will be refactoring the startup procedure on Monday, my plan is to remove app_main callback from wifi stack. Instead, let application provide a normal C-like main, from which one can initialize WiFi/BT stacks as one would initialize any other driver. I can put event processing refactoring in the same MR.

devsaurus commented 7 years ago

if you're feeling adventurous you could look at getting the -fstack-usage patches into the xtensa-esp32-elf toolchain as well

Sure, what repo/branch are you building from?