nodemcu / nodemcu-firmware

Lua based interactive firmware for ESP8266, ESP8285 and ESP32
https://nodemcu.readthedocs.io
MIT License
7.64k stars 3.12k forks source link

Reduce the size of the luaL_Buffer object #1523

Closed pjsg closed 7 years ago

pjsg commented 7 years ago

Missing feature

The current luaL_Buffer consumes 1k bytes of stack space. This is a lot on the ESP8266.

Justification

Not much stack available!

Nearly all the code deals with the buffer size not being large enough. This issue is to fix any errors and then reduce the buffersize to (say) 128. Once the issues are fixed, then it should be possible to run with much smaller buffer sizes (e.g. 4) and the system should still work -- albeit a bit more slowly.

TerryE commented 7 years ago

As discussed in #1289, I think that this is the wrong way to go. I feel that the initial buffer length should be made explicit and set at runtime, with the buffer itself allocaed, hence the preamble of str_reverse()

static int str_reverse (lua_State *L) {
  size_t l;
  luaL_Buffer b;
  const char *s = luaL_checklstring(L, 1, &l);
  luaL_buffinit(L, &b);

would become:

static int str_reverse (lua_State *L) {
  size_t l;
  const char *s = luaL_checklstring(L, 1, &l);
  lua_buffer(L, b, l);

So this would allocate 18 bytes on the stack if the input was 18 bytes long, 1024 bytes if 1024 bytes long, etc.. There would only be a memory / performance impact v.v. the current implementation if the actual string grew to be factors longer than the initial allocation.

For the few spot uses where the code really needs the buffer to be 1024, then we simple use 1024 (or BUFSIZ) in those situations.

TerryE commented 7 years ago

@pjsg, @jmattsson, I am taking a couple of weeks break over in Greece to close up our cottage for winter. Jan is staying home to keep an eye on the build (her turn), so I'll have lots of time to program. Do you want me to do the alloca version or do you prefer to do simpler fixed size reduction?

pjsg commented 7 years ago

I was going to do the simpler, size reduced version.

What I was interested in was to monitor the depth of the stack -- by filling the empty stack area with a specific pattern and then checking to see how much of it gets overwritten during normal operation. I have no feel for how much trouble we are in. In particular, with DEVELOP_VERSION enabled, I think that we are already blowing the stack. It used to work, so maybe this is a 1.5.4.1 issue. Maybe c_printf has gotten a lot more stack hungry.....

TerryE commented 7 years ago

That's hardly surprising. Have you had a look at c_stdio.h?

#define c_printf(...) do {                  \
    unsigned char __print_buf[BUFSIZ];      \
    c_sprintf(__print_buf, __VA_ARGS__);    \
    c_puts(__print_buf);                    \
} while(0)

In other words any stack frame which includes any NODE_DBG statements is 1024 larger when DEVELOP_VERSION enabled. Crazy on the ESP8266. If we want to KISS, then maybe we pull c_printf out of line, use a static buffer within the routine and a reentrance flag to inhibit calling any nested NODE_DBG arguments being functions which include NODE_DBG statements.

pjsg commented 7 years ago

Oh wow. That is horrible. I'm not sure that I want to burn 1k of RAM for c_printf though...

Is there just a version of printf that doesn't use a buffer?

TerryE commented 7 years ago

BTW I've just checked on my laptop (not my devel VM so the counts are only rough) but there were ~400 / 70 NODE_DBG and NODE_ERR calls respectively. Of these only 110 / 20 have arguments. So as well out pulling cprint_f back into c_stdio.h, we should consider a short-circuit to use c_puts if only one argument is specified. And given that we don't use an nprint variant we should at least check to ensure that the length of the output buffer isn't greater than the allocated size and throw a fatal error if this happens.

TerryE commented 7 years ago

1K is far better that 7×1K or whatever. Ideally we should use c_snprintf -- that is a variant where we specify a hard maximum length, except that the c_stdio.c code doesn't include this, so the code will happily buffer overrun. In practice this is most likely to occur if the format statement includes an unlimited %s specifier, but there are only a dozen of those in the source. Again if the function was out of lined, then we could roll up a maximum output length, so you would have a better idea if 512 or 256, say, was a safe size.

jmattsson commented 7 years ago

All this talk about c_xyz() is making me cringe... I managed to pull all of that stuff out on the RTOS branch, and then I did it all over on the IDF.

Also, for doing stack analysis, please do give @devsaurus' -fstack-usage patches for the toolchain a go; Having the static stack usage available for reference is incredibly useful.

pjsg commented 7 years ago

Can we just us os_printf or ets_printf instead? This would probably save a good amount of stack.

jmattsson commented 7 years ago

No, that's what the chattery SDK uses for all its debug output, and we turn that off first thing in user_init() with system_set_os_print(0).

jmattsson commented 7 years ago

If you want a quick-hack for the c_printf() macro, I suspect you could declare a single, global buffer that gets used all the time. Nothing should print from ISRs during normal operation anyway, and for debugging I think it's preferable to get some garbled output versus chasing Heisenbugs introduced by printf buffers. Just a thought.

TerryE commented 7 years ago

All this talk about c_xyz() is making me cringe...

Whether we wrap the c_* calls by defines or not, so long as we support the ESP8266 we are constrained by its RAM and subroutine availability. Using the unchecked versions can crap over the stack for an entirely avoidable reason. That makes me cringe 😄

TerryE commented 7 years ago

If you want a quick-hack for the c_printf() macro, I suspect you could declare a single, global buffer

:+1: ditto to my suggestion above. There's also no point in inlining this code sequence with a macro. By moving the c_printf code back to c_stdio.c we can detect reentrancy into the printf routine and suppress the inner call, plus detect buffer overrun.

pjsg commented 7 years ago

We either want to add snprintf (so that we can prevent a buffer overrun) or we want an fprintf (like) function that can output directly and doesn't need a buffer.

The current BSD implementation of vfprintf looks large, but it would allow direct printing with no buffer and it could be wrapped to provide an snprintf implementation....