wren-lang / wren

The Wren Programming Language. Wren is a small, fast, class-based concurrent scripting language.
http://wren.io
MIT License
6.87k stars 552 forks source link

wrenCompile fails to initialize some fields of the parser #1028

Open graphitemaster opened 3 years ago

graphitemaster commented 3 years ago

The parser fields previous and current are left in an uninitialized state when wrenCompile calls nextToken. This is caught by -Wuninitialized in gcc, but also breaks Wren when built with ThinLTO as the first two assignments in nextToken are removed from the generated code.

This is an amalgamated build but the relevant warning can be seen here

src/lib/wren.h:5186:20: warning: ‘parser.Parser::current’ is used uninitialized [-Wuninitialized]
 5186 |   parser->previous = parser->current;
      |   ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
src/lib/wren.h: In function ‘ObjFn* wrenCompile(WrenVM*, ObjModule*, const char*, bool, bool)’:
src/lib/wren.h:7901:10: note: ‘parser’ declared here
 7901 |   Parser parser;
      |          ^~~~~~
mhermier commented 3 years ago

This is odd, I run wren in valgrind from time to time, and the issue never showed up. Are you sure GCC is not a little bit optimistic at finding the error, and it is in fact impossible to happen?

graphitemaster commented 3 years ago

It's correct, wrenCompile starts like

  Parser parser;
  parser.vm = vm;
  parser.module = module;
  parser.source = source;

  parser.tokenStart = source;
  parser.currentChar = source;
  parser.currentLine = 1;
  parser.numParens = 0;

  // Zero-init the current token. This will get copied to previous when
  // nextToken() is called below.
  parser.next.type = TOKEN_ERROR;
  parser.next.start = source;
  parser.next.length = 0;
  parser.next.line = 0;
  parser.next.value = UNDEFINED_VAL;

  parser.printErrors = printErrors;
  parser.hasError = false;

  // Read the first token into next
  nextToken(&parser);

Nowhere in here is parser.current or parser.previous initialized, the stack allocation of Parser parser; leaves it uninitialized, the very first call to nextToken then does, as the first two statements in the function

// Lex the next token and store it in [parser.next].
static void nextToken(Parser* parser)
{
  parser->previous = parser->current;
  parser->current = parser->next;
  // [snip]
ruby0x1 commented 3 years ago

nextToken is called twice, which copies next -> current and current -> previous so both should be set but there's no particular reason I didn't initialize them. It'd make sense to do so cos the compiler can't see the order of init there and being explicit is better.

ChayimFriedman2 commented 3 years ago

The problem is that reading an uninitialized memory is always UB in C, and thus when nextToken() executes parser->previous = parser->current; (but current is uninitialized), it executes UB, allowing the compiler to remove this line or even the whole program (or format your hard drive).

ruby0x1 commented 3 years ago

I already said it's better to initialize them explicitly, which will happen soon, don't worry about your files on your PC until then (hyperbole doesn't make a point stronger in practice fwiw).

ChayimFriedman2 commented 3 years ago

I just wanted to note it's not just better, but in fact required to conform the C standard. Sorry if I wasn't explicit enough.

graphitemaster commented 3 years ago

Yeah, reading the uninitialized value invokes UB, it's why ThinLTO miscompiles the code and removes the assignments in nextToken because the linker optimizations assume code does not invoke UB. There's a free license to remove the code that would / does invoke UB since it shouldn't happen. Basically doesn't matter if eventually it gets cleared out correctly, one read of an uninitialized value is all it takes with aggressive optimizations sadly. Not saying I agree with how the standard works here or anything. Just observing how and why, pedantry aside, it's a simple necessary fix in my case.

cxw42 commented 3 years ago

Fixed in #1012 :)