nodemcu / nodemcu-firmware

Lua based interactive firmware for ESP8266, ESP8285 and ESP32
https://nodemcu.readthedocs.io
MIT License
7.66k stars 3.12k forks source link

Upgrade to SDK 1.5.0 #799

Closed jmattsson closed 8 years ago

jmattsson commented 8 years ago

I'll have a look at doing the upgrade to SDK 1.5.0 over the next few days. Going by the release notes we might have a naming conflict with -lcrypto that will need to be resolved somehow.

TerryE commented 8 years ago

Johny, is this a good opportunity to mirror the approach taken in the Open SDK Makefile with its $(VENDOR_SDK_DIR)/.dir and patch targets? This way the firmware make would simply fetch and unpack the SDK from the bbs.espressif.com site rather than have the files in this repo.

make clean would leave the SDK untouched, but we could have a new target make clean-sdk which would allow you to clean and refetch the SDK if, say, the patch had changed.

Doing these changes are pretty straightforward but still on my todo list. How about rolling this approach into your 1.5 rebaseline? This will get rid of a whole load of denormalised binaries from this repo.

marcelstoer commented 8 years ago

the firmware make would simply fetch and unpack the SDK from the bbs.espressif.com site rather than have the files in this repo.

I agree that this would be a sensible approach - in general.

However, the build will become less stable. Before #499 it would fail every now and then just because Travis failed to download esp-open-sdk.tar.gz from GitHub. Only if everything we need is in the repository we can guarantee consistency when the build starts (Travis does git clone first).

So, based on that experience I'd rather not change that.

jmattsson commented 8 years ago

Do you have any experience how well Travis handles git submodules? Might that be a middle ground?

marcelstoer commented 8 years ago

I don't, sorry. We've first been discussing Git submodules in #595. Since you'd essentially tell Git to fetch or "resolve" submodules from either within the Makefile or .travis.yml (no ideal because that's not used outside Travis) it'd be rather a Git-issue than a Travis-issue. That in turn probably means we can expect more stability.

ghusson2 commented 8 years ago

Hello guys, I am waiting since 2 month the last SDK to be integrated in nodemcu (beacuse I had serious bugs over 0.9.5 one, and I am standing by until SDK update). I was asking if nodemcu was going to fall... A fresh SDK will boost nodemcu usage as LUA/nodemcu is very simple to code (ex : a telnet console, in parallel with normal scripts : http://fil.crepp.org/pages/view/51/nodemcu) and as the current SDK is buggy and lacks some features.

I won't be able to help you. But I encourage you really ! Good luck jmattsson and helpers ! This would be a nice Xmas gift, really :)

nickandrew commented 8 years ago

@ghusson2 The dev branch is already SDK 1.4.0 so you should try it until a 1.5.0 build becomes available.

madhurjain commented 8 years ago

I've built a project using NodeMCU 0.9.6 SDK which uses WiFi. I am to present this project at the University which has a WPA2-Enterprise WiFi. SDK 1.5.0 adds support for WPA2-Enterprise. Is there a possibility of NodeMCU Firmware with SDK 1.5 being released anytime soon? Or is there a way I can integrate SDK 1.5 with the current dev branch of NodeMCU just to add WPA2-Enterprise support?

TerryE commented 8 years ago

@madhurjain, sorry but we are going as fast we can. We don't offer a per-user service.

madhurjain commented 8 years ago

Thanks for your reply @TerryE I am just trying to understand the complexity of adding SDK 1.5 support to nodemcu and if there's something I can do quickly by forking this repo so I can connect my device to WPA2-Enterprise.

karrots commented 8 years ago

One thing to be aware of is SDK 1.5 only supports certificate based auth. So uf your using PEAP with MSCHAPv2 for the EAP type on your wireless it still won't work.

madhurjain commented 8 years ago

@karrots Ahh. I am in fact on MSCHAPv2. Thanks for the clarification.

ghusson2 commented 8 years ago

@nickandrew : ok, I didn't know. I think this is because I didn't found the way to look at the BSP number, and because the README.md was not up to date. Now I see main branch is 1.4, I will test it soon, thank guys :) RQ : if the dev branch is on BSP 1.5, please update the readme in this branch ? An other remark, the wiki and changelog on it was not updated since 2015-03-31. I know those docs won't make the project go forward, but it is important for noobs like me to have an idea of the project (status, community, improvements made and futures ones).

marcelstoer commented 8 years ago

An other remark, the wiki and changelog on it was not updated since 2015-03-31

There are issues for that. See https://github.com/nodemcu/nodemcu-firmware/labels/documentation.

those docs won't make the project go forward

Oh, they definitely will! That's why we did and will put more effort into that.

nickandrew commented 8 years ago

@marcelstoer I could just delete the changelog from the wiki? I don't think these changelogs have any use - not to new users at least - and more accurate info can be found by perusing the closed pull requests.

It's not much but removing the changelog will not mislead new users into thinking the project is stagnant because it hasn't been updated for 9 months.

marcelstoer commented 8 years ago

I could just delete the changelog from the wiki?

See #827.

ghusson2 commented 8 years ago

@marcelstoer, you are right. Precisely, doc don't put forward the code, the product. But it helps to give a good representation of the project. Thus it tends to increase its visibility among others similar project and can attract other users and contributors. If the changelog is not maintained up-to-date, it should be deleted I agree. As a user, I will look at the issues if I have an issue myself. And what is important for an end user (in addition to the user manual/tutorial) is a roadmap with a following of what has been implemented (ie : what has been improved since the beginning, and what we are working on). A good exemple I know for that is Proxmox VE (simple and efficient : roadmap/release history) : https://pve.proxmox.com/wiki/Roadmap.

jmattsson commented 8 years ago

Life & work has been very busy the couple of weeks so I haven't had a chance to work on this yet. Next week should be better.

I'm still torn on what the best approach here really is. I agree with Terry that the SDK itself doesn't belong in this repo, nor does a pre-compiled toolchain for linux/amd64. However, at the same time the current situation means that the build is fully contained and reproducible, locked to the expected SDK revision, and relying on no external items.

Right now I'm considering setting up a nodemcu-build repo containing both the scripts to build the toolchain and patch the SDK, as well as the resulting artifacts, then submodule that repo into nodemcu-firmware. That would have the advantage of getting the SDK & toolchain out of this repo while still keeping the CI build well contained, but more or less just pushes the mess into the new repo instead. I think that's still preferable though. It might be that I could avoid getting artifacts into that repo, depending on how well I can instruct the CI build to not clean that part between builds.

eyaleb commented 8 years ago

I agree. We should have a repo with all the build dependencies, not only for the CI but also to be able to say "can you reproduce this problem with the reference build? Using what settings?". In the past I had my own VM, toolchain (compiler etc.) and got the SDK from espressif. I had no confidence that I am using the correct collection that the fw was built against (the README was not very specific). I now use my usual workstation with everything off the nodemcu-firmware repo, no guessing.

nodemcu commented 8 years ago

+1, separate toolchain and SDK is a good idea I think. there will be needs that use different toolchain and sdk, we can setup a default and make SDK=... for alternative. btw, just received the esp32 beta board from espressif this morning, since esp32 non-os sdk is not out and no plan heard from espressif, maybe NodeMCU port to esp32 will go for rtos sdk. fortunately rtos sdk has same api with non-os sdk.

jmattsson commented 8 years ago

@nodemcu lucky you! :) We're still waiting for our beta board!

nodemcu commented 8 years ago

@jmattsson I'm in a city nearby:) since the new chip esp32 is much more capable. we will gonna make a new NodeMCU dev-kit. I raised an issue Suggestions about new NodeMCU devkit based on ESP32 at repo devkit just now.

jmattsson commented 8 years ago

Just a note to let people know I've started working on the SDK 1.5.0 upgrade & build/toolchain rejig.

jmattsson commented 8 years ago

Okay, I've finally gotten something that seems to work. You can have a look at my branch sdk150_build. It now submodules in another repo, nodemcu-build, which holds the SDK (1.4.0 + 1.5.0 supported) as well as the esp-open-sdk toolchain, in source format. The SDK copy which was in the nodemcu-firmware repo has been dropped.

The good:

The bad:

The ugly:

I'm not going to PR this right away, because I'm really not sure this is the right way forward. I think we should have a thorough discussion about it, and test/bash the living daylights out of this approach before we commit to it. If we'd like to get SDK1.5.0 into NodeMCU sooner than that, I can quite easily do that upgrade on our existing structure.

marcelstoer commented 8 years ago

The whole thing doesn't feel quite right to me.

Being the original author of the PR which added the esp-open-sdk.tar.gz to our repository I have to agree with you :wink:.

I believe you won't find anyone who'd disagree that in general the submodule approach, or something similar, is preferred over hosting our own copy of external binaries. In this particular case though the list of bads and uglies is just way too long.

Of course I'm biased. We're currently at over 120 people who use the cloud build service per day. Anything that increases the build time for them is, well, not ideal. Why should we have to build the same toolchain over and over again?

How about the middle ground? We could have a dedicated repository that hosts the binaries and just submodule those ready-made artifacts.

jmattsson commented 8 years ago

Here is a plain SDK+LWIP upgrade without rejigging any of the build stuff. As before, an unmodified SDK lives in the sdk/ directory, with a couple of compatibility fixes on the side over in sdk-overrides.

jmattsson commented 8 years ago

@marcelstoer I'm very tempted to say that we're already at the middle ground with what we have. Splitting off the toolchain tgz into a submodule'd repo just seems to add complexity for the sake of principle. There's still prebuilt binaries in a repo at the end of the day. The only time I think there would be sufficient gain is if we simultaneously filter-branch'd away any historical trace of that tgz from the nodemcu-firmware repo, to reduce the repo size. That's quite a sledgehammer approach though, and I'm not going to advocate for rewriting history to that extent.

TerryE commented 8 years ago

. Splitting off the toolchain tgz into a submodule'd repo just seems to add complexity for the sake of principle

Sorry but I differ on this. It is more that a principle. We've got a whole cluster of elements in our repo that (a) we have no control over and (b) clutter up and bloat it for Lua developers that want to fetch the repo to do local builds. Not the Right Thing to do

jmattsson commented 8 years ago

Sure, but removing the tgz does not reduce the size of the repo. It is always part of the history, and you'll spend the time cloning it. You need to filter-branch the whole repo and eradicate it from the repo history if you want it truly gone. Having had to do one of these at $work a few years ago, I am not volunteering to attempt that. Plus of course, the entire repo history becomes less than truthful, and it becomes impossible to actually go back in time and rebuild things as they were at the time.

TerryE commented 8 years ago

Sure, but removing the tgz does not reduce the size of the repo.

1.5, 1.5.1, 1.6,...

Yes, we have to live with historic blobs because of the reasons that you detail, but this is too simple a generalisation. Whilst the git diff algos generate extremely space efficient compression of source-style files, they aren't applied to binary files. It is extremely difficult to do space and time efficient delta compression of binaries without some model of the contents (e.g. is it zip file, tag.gr, db image, etc. because binary files usually contain internal links which relocate from version to version or use adaptive LZW-style encoding). So in practice if the SDK is roughly 10 Mbyte then the repository will grow 10 Mbyte for each version added. I think that we should avoid this growth.

So I believe that both my previous comments, both (a) and (b), are still valid. As a compromise why not stay with including the SDK _zip_ inside our repo for the 1.5 release and look at sorting it out for the next release if one of us has the time and energy. At least this way the repo will on gorw 10% in size for this release.

It is at least a single file and with a defined origin at the vendors website, in this case bbs.espressif.com/download/file.php?id=989.

marcelstoer commented 8 years ago

Splitting off the toolchain tgz into a submodule'd repo just seems to add complexity for the sake of principle.

I'm with Terry on this (for the reasons he stated). Had I, or those who supported & merged my initial PR, understood all the consequences and had I anticipated back then how frequently Espressif release new SDKs I'd have proposed submodules right from start. Outsourcing the binaries into a dedicated repository doesn't add a lot of complexity IMO. As long as we stick to standard Git features to achieve that I'd argue it's justifiable.

As a compromise why not stay with including the SDK zip inside our repo for the 1.5 release and look at sorting it out for the next release...

I don't see why we should postpone that. From what I understand Johny has already done most of what'd be required?

TerryE commented 8 years ago

The proposed approach has:

As before, an unmodified SDK lives in the sdk/ directory, with a couple of compatibility fixes on the side over in sdk-overrides

This is far worse than a zip of the SDK in our repo because (a) the zip is smaller than the unpacked SDK, and (b) it is a single configuration item that easily verifiable against the source file.php from Espressif's pbpBB3 board, rather than the 145 files and 35 directories added by exploding it.

jmattsson commented 8 years ago

Whilst the git diff algos generate extremely space efficient compression of source-style files, they aren't applied to binary files.

Git doesn't use delta compression, you must be thinking of svn or some other vcs. Git stores each version of an object, and does on-the-fly diff'ing. And git does compress binary files (and text files), using zlib (just like a .gz). Google git packfiles for the low-down. That, together with the fact several of the libraries in the SDK might not change between versions (737k of libraries stayed unchanged from 1.4.0 to 1.5.0), might make it a false economy to stash the SDK zip file.

Having said that, I'm not necessarily opposed to that approach, and it would be quite easy to change to. If everyone agrees on at least that much I can do another pass to prepare that. It does seem orthogonal to the toolchain argument.

(b) clutter up and bloat it for Lua developers that want to fetch the repo to do local builds.

I still don't understand your reasoning here. Whether submodule or in-tree, it has to be fetched in order to build. The only time you'd have an actual saving here is if the CI-toolchain was never part of the tree, and only submoduled. Those not needing the toolchain could then opt not to initialise the submodule. For all the scenarios we're currently discussing, there is no time or space saving, only the added hassle of a submodule.

jmattsson commented 8 years ago

Here is another branch which has the 1.5.0 upgrade in its zipped form.

TerryE commented 8 years ago

Johny, sorry. Yes, I came from svn and you are correct git that each stores each object separately and only uses delta compression optimise network traffic. But a 7% reuse of objects / saving doesn't merit unbundling, IMO.

Maybe I am just stupid but what is wrong with the open-sdk method of just doing a check of the cache based on the presence and SHA1 of the zip file and on failure doing a wget --tries=10 --timeout=15 --waitretry=30 --read-timeout=20 --retry-connrefused --continueor a curl --retry=10 -C or the like from the Espressif site on a cache miss, and then we don't need the file in the repo at all. What am I missing here?

But I'll pull your 1.5 branch down tomorrow and give it a run. Thanks Johny :)

I do a PR for my configure script (I'll stick to bash) as an optional extra. I will write it anyway, and I suspect that most developers will find it easier and more convenient. I certainly will. At the moment I have to stash my user_*.h files and recheck them out before doing a commit / PR. then unstash them again. What a pain. Just having a $ gen_user which takes a small config file and does this for me is far better in my view. Having it in the tools directory makes its use entirely optional, and allows developers to continue editing header files to define their configuration if that is their want.

Edit: Life's too short.

TerryE commented 8 years ago

Sorry I thought that you'd left out updating the NODE_VERSION_MINOR, but that was my cockup. I'd saved the user_*.h files and check the config and modules before writing them back, forgetting that I was also over-writing the version file. Leaving the BUILD_DATE as unspecfied is probably OK whilst were in dev

TerryE commented 8 years ago

I was going through the 1.5 Rel notes and saw the caveat about makes must include -lcrypto and I remembered that we already had a crypto library in the app, so I checked for duplicates, and there are two copies (name clash) of the following libraries crypto, json, lwip and upgrade, one in the sdk and one in the crypto, json, lwip and upgrade directories, respectively. OK, both versions are pathed in separately (except the SDK upgrade library which isn't used in the firmware), but I am still uncomfortable with the name clash.

Confused from Northamptonshire

jmattsson commented 8 years ago

I'm not quite awake here anymore, but I'll take a stab at some answers:

I'm not overly keen on having our names crowded out by added sdk libraries (there were two this release, crypto and wpa2). Should we simply repack all the sdk libs into one sdk.a and be done with it? An extra second or two in link time might be worth it to avoid name clashes? (I reserve the right to consider this a Really Dumb Idea(tm) tomorrow when I'm awake btw, just sayin')

creationix commented 8 years ago

For what it's worth, git does use delta compression in it's pack file format and there is no distinction between text and binary files, it treats them all the same. This is used for both the network protocols and for internal persistence. Small repos will start out with every object in it's own deflated file, but eventually some threshold will trigger (or you can run git gc manually) and it will be packed into a packfile.

That said, I would highly recommend submodules. They are very elegant from an internal git perspective, it's just the high-level "chrome" that gets them wrong and makes the UX look bad.

TerryE commented 8 years ago

@jmattsson Johny, so it's your turn to post when almost dead, eh? :sleepy:

Also on a general point, IMO we shouldn't do a full extract of the SDK Zipfile, but only what we need

$(TOP_DIR)/sdk/.extracted:
    mkdir -p "$(dir $@)"
    (cd "$(dir $@)" && unzip $(TOP_DIR)/cache/esp_iot_sdk_v$(SDK_VER)*.zip \
           -d $(SDK_VER)/lib  -d $(SDK_VER)/ld -d $(SDK_VER)/include )
    rm  --f $(SDK_VER)/lib/liblwip.c  
    touch $@

@creationix Tim, for other readers here is a discussion: Git Internals -- Packfiles, but my comments about binary deltas still apply: most algos give poor delta performance on files such as libraries.

Have you though of giving a hand here by testing / reviewing Johny's contribution:

git fetch https://github.com/jmattsson/nodemcu-firmware.git sdk150bin:sdk150bin
TerryE commented 8 years ago

I've also got a few PRs that I want to do but want to wait until we've merged a working 1.5 into dev.

jmattsson commented 8 years ago

So based on the earlier discussion, should I do any of:

Marcel, you're the one most directly impacted if I do the wget I think, what are your thoughts? Can you keep it cached across builds? In Travis we could simply designate the cache dir for Travis-caching, so it's really the cloud builder that I'm concerned about.

Having a bit of further cleanup and tossing out old crypto/json/whatnot sounds good, regardless.

TerryE commented 8 years ago

Sorry if I sound like a pissant, Johny. It's just that this stuff does merit proper review and I seem to be the only one willing to do it :cry: (1) and (2) are really Marcel's call because this will impact the stability and timing of cloud builds. Maybe we take this off the critical path for getting a 1.5 out.

As far (3) goes, in reality so long as we don't have duplicates like json and upgrade, aggregating libraries probably won't make a material impact to timing. So up to you. What slight concerns me more is where we and Espressif have chosen colliding namespaces (e.g. cyrpto_*) But again, I looked for collisions this time and didn't find any, so sorting this out isn't urgent.

BTW. I've done crypto.fromBase64() but I will leave off pushing it until 1.5 is merged. Ditto the GPIO fix.

marcelstoer commented 8 years ago

Marcel, you're the one most directly impacted if I do the wget I think, what are your thoughts?

They haven't changed much... To me it's, now, pretty clear what is "right" and what is "wrong", it was my PR that eliminated wget from .travis.yml. I did so because the builds kept failing frequently due to wget timeouts/hiccups/whatever between Travis and GitHub where we used to get the pre-built toolchain from.

... so it's really the cloud builder that I'm concerned about.

Thanks for caring but the cloud builder is Travis :wink:

Ideally I wouldn't have different build scripts for different branches. That shouldn't bother you too much, though. Should I need to change them I'd appreciate if merging an upcoming PR will be a concerted action with me releasing new build scripts at the same time. I wouldn't want to have dozens of failed dev builds just because merging happened when I was absent.

TerryE commented 8 years ago

Given the fact that I would like to do a few other bits and pieces, is this a good opportunity to merge this into a dev150 branch on nodemcu? It should only take us a few days to sort this out -- or maybe weeks, given the upcoming holidays. We can freeze PRs to dev for a couple of weeks. That way the current dev cloud build will be OK, and we can move dev150 over dev when we are happy.

TerryE commented 8 years ago

Re dev150: I propose to do this tomorrow based on @jmattsson Johny's sdk150bin branch. However, Marcel, I recommend that we do not include it in the cloud-builder selection list or we'll just get premature issues raised by Lua devs. We can add any new features there and when we are comfortable switch it over to dev. Please comment back if anyone has an issue with this suggestion.

jmattsson commented 8 years ago

@TerryE I added the reduced unzip to the sdk150bin branch.

I've also done a test with the dreaded wget, available in the sdk150dl branch. The pros include it not having the SDK zip in the NodeMCU repo, and the cons include it not having the SDK zip in the NodeMCU repo. I did however configure Travis to cache the cache/ dir, so Travis only has to download it once. It seems to work well. @marcelstoer could I get you to have a play with this branch and see if you would be satisfied with its workings? Personally I'm on the fence between the sdk150bin and the sdk150dl branch, both approaches have merits imo.

As for a dev150 branch, do we really need it? master is alive and well and we could snapshot dev onto it before we start with 1.5.0, to keep a recent 1.4.0 around. If we do do a dev150, can I suggest we start piling in the new documentation support in there too?

nickandrew commented 8 years ago

@TerryE, I don't think a 1.5.0 SDK should be added to the repo at any time, i.e. it should be sorted out whether it's obtained with wget or via git submodule before making a dev150 branch.

@jmattsson Would you be able to rework your sdk150dl branch into two commits, please? One to delete sdk/ and the other to contain all the changes you had to make to get 1.5.0 working; that will be much easier to understand (just read the 2nd commit and ignore the 1st).

TerryE commented 8 years ago

I am happy with either to be honest. I've got a small preference for the dl version. If the wget that unreliable when cached and with --tries=10, etc.? Perhaps the only thing that i'd check would be to put an md5 or even just a size check on the download, just to recover from the odd scenario when we did get a brownout from China.

One other little comment I notice (a) that you are using gz format for the open-sdk tarball. I know xz -9 compression is as slow as hades, but the resultant compressed tarball is 14Mb rather than 34Mb and decompression time is pretty much the same time as gz decompression. Also, I take it that this is an Ubuntu 12.04 binary build compatible with the Travis CI service so it makes sense only to include the bits that we need. For example do we need xtensa-lx106-elf-gdb, cc1plus or libstdc++.a as these combined are another third of the tarball? Just a thought to keep the repo small for pulling the latest version.

Oh yes, and it goes without saying that whatever the pros and cons of separating out the SDK tarball also goes for the open-sdk tarball as this is the biggest chunk of the repo by far and is only Travis CI specific.

TerryE commented 8 years ago

@marcelstoer, I've been using the sdk150bin branch and haven't found any glaring issues with it. I haven't tried the dl variant, but I don't foresee issues. Given the why "dev150"? quotes, what I am suggesting is to update current master to dev so that master is the last dev140 build (there's only 4 minor commits difference anyway), and then move sdk150bin/dl to dev so that this can unlogjam the 1.5 dependent changes that I have in train for PRs.

Can you flip a coin and tell me which to use?

TerryE commented 8 years ago

OK, Master is now at the latest change compared to Dev. There were a couple of minor commits in Master that I had to back-sync first. Not sure why. But we can now start to add 1.5 changes to dev.