wren-lang / wren-cli

A command line tool for the Wren programming language
MIT License
129 stars 30 forks source link

Using an undefined variable in the REPL will silently initialize it to 1 #62

Open heyajulia opened 3 years ago

heyajulia commented 3 years ago

I just noticed that using an undefined variable in the REPL will silently initialize it to 1:

> a + 1
[repl line 1] Error at 'a': Variable is used but not defined.
> a + 1
2
> a
1

I'm using Wren 0.3.0 built from the latest main as I'm writing this (i.e. b82cf5a).

ChayimFriedman2 commented 3 years ago

I've investigated this a little.

First thing to understand is that the value 1 is not because you did + 1: if you would do + 8 this would be the same. This is 1 because a is implicitly declared at line 1 (in REPL scripts, each new expression starts fresh line counting, so its first line is always 1).

Why?

Because if we hit an undeclared global variable, we declare it with a fancy placeholder value in the form of the source code line (I don't know who had the idea, but this is cool - we need to initialize it to something, so why not doing something funny!).

Here:

  if (variable.index == -1)
  {
    // Implicitly define a module-level variable in
    // the hopes that we get a real definition later.
    variable.index = wrenDeclareVariable(compiler->parser->vm,
                                         compiler->parser->module,
                                         token->start, token->length,
                                         token->line);

The problem is that I suspect it will not be easy, if possible at all, to fix this. Wren semantics allows using a module variable before its declaration. Because we have a single pass compiler, we cannot differentiate valid use cases like:

class C {
  m() { ModuleVariable }
}
var ModuleVariable = 5

And illegitimate uses like the one you posted.

A better thing to do is to initialize with null. I'll send a PR soon Edit: See my comment below. We may introduce some internal undefined value, initialize only declared variables to it and check at runtime if an accessed variable contains it (meaning it was not defined), but this will add runtime cost, so I'm pretty sure this won't be done.

But anyway, thank you for notifying us!

heyajulia commented 3 years ago

Thank you for the detailed response!

On Fri, 5 Mar 2021 at 01:26 Chayim Refael Friedman notifications@github.com wrote:

I've investigated this a little.

First thing to understand is that the value 1 is not because you did + 1: if you would do + 8 this would be the same. This is 1 because a is implicitly declared at line 1 (in REPL scripts, each new expression starts fresh line counting, so its first line is always 1).

Why?

Because if we hit an undeclared global variable, we declare it with a fancy placeholder value in the form of the source code line (I don't know who had the idea, but this is cool - we need to initialize it to something, so why not doing something funny!).

Here https://github.com/wren-lang/wren/blob/5b290cacc54e4810fd4008de14fc0b5008eaa450/src/vm/wren_compiler.c#L2264-L2271 :

if (variable.index == -1) { // Implicitly define a module-level variable in // the hopes that we get a real definition later. variable.index = wrenDeclareVariable(compiler->parser->vm, compiler->parser->module, token->start, token->length, token->line);

The problem is that I suspect it will not be easy, if possible at all, to fix this. Wren semantics allows using a module variable before its declaration. Because we have a single pass compiler, we cannot differentiate valid use cases like:

class C { m() { ModuleVariable } } var ModuleVariable = 5

And illegitimate uses like the one you posted.

A better thing to do is to initialize with null. I'll send a PR soon. We may introduce some internal undefined value, initialize only declared variables to it and check at runtime if an accessed variable contains it (meaning it was not defined), but this will add runtime cost, so I'm pretty sure this won't be done.

But anyway, thank you for notifying us!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wren-lang/wren-cli/issues/62#issuecomment-791051594, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQLZY2PN3CY3SSDXPPHENJ3TCAQKXANCNFSM4YUF2BIQ .

ChayimFriedman2 commented 3 years ago

If this wasn't obvious, this can be reproduced even without the REPL, but it requires some meta artifact:

import "meta" for Meta

Meta.compile("a")
System.print(Meta.compileExpression("a").call()) // 1

Unfortunately, the playground doesn't currently allow meta and random so this is not possible to reproduce this there (but you can using the CLI).

BTW, @ruby0x1 you merged a PR that allowed these two modules in the playground, right?

heyajulia commented 3 years ago

Cool, thanks!

On Fri, 5 Mar 2021 at 01:36 Chayim Refael Friedman notifications@github.com wrote:

If there wasn't obvious, this can be reproduced even without the REPL, but it requires some meta artifact:

import "meta" for Meta

Meta.compile("a") System.print(Meta.compileExpression("a").call()) // 1

Unfortunately, the playground doesn't currently allow meta and random so it isn't possible to reproduce it there (but you can using the CLI).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wren-lang/wren-cli/issues/62#issuecomment-791055696, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQLZY2MUYMJ7GSB6T4BXKWDTCARQZANCNFSM4YUF2BIQ .

ChayimFriedman2 commented 3 years ago

So actually, this issue relates to the language repo 😉.

ChayimFriedman2 commented 3 years ago

It seems there is a problem to set the value to null: it is used later to achieve the line where it was declared. I'm pulling the promise to PR back 😛.

joshgoebel commented 3 years ago

@ruby0x1

Am I right in thinking that if we hooked vm->config.errorFn we could capture these errors, parse the resulting message, and then run our own code to set defaults? e.g:

Error at 'a': Variable is used but not defined.

We pull out the a and run: a = null.

Would that be a potential approach or is this a documentation thing where we should improve the error messaging so it explains these variables TRULY should be 1? I see the comment:

This also marks the compilation as having an error, which ensures that the resulting code will be discarded and never run.

Yet I'm not sure it matters if it's run if it permanently alters the state of the compiler... ? I guess I'd hope if there was a serious error that perhaps the compiler state could be thrown away or rewound?

Any thoughts on this one?

ChayimFriedman2 commented 3 years ago

Am I right in thinking that if we hooked vm->config.errorFn we could capture these errors, parse the resulting message, and then run our own code to set defaults? e.g:

Error at 'a': Variable is used but not defined.

We pull out the a and run: a = null.

I'm not sure, I can investigate this a while, but either way, the variable will still be implicitly defined - just initialized to null, which indeed makes more sense.

Yet I'm not sure it matters if it's run if it permanently alters the state of the compiler... ? I guess I'd hope if there was a serious error that perhaps the compiler state could be thrown away or rewound?

I don't understand.

joshgoebel commented 3 years ago

just initialized to null, which indeed makes more sense.

But then I realized you have cases like:

a = b + c

While finding a to be equal to null later is surely better than 1 this still leaves something to be desired. It seems like the correct solution here would be:

> a = b + c
[repl line 1] Error at 'a': Variable is used but not defined.
[repl line 1] Error at 'b': Variable is used but not defined.
[repl line 1] Error at 'c': Variable is used but not defined.
> a = b + c
[repl line 1] Error at 'a': Variable is used but not defined.
[repl line 1] Error at 'b': Variable is used but not defined.
[repl line 1] Error at 'c': Variable is used but not defined.

I feel like there is an implicit guarantee here:

which ensures that the resulting code will be discarded and never run.

IE, that this resulting code (that failed to compile) therefore would "have no side effects" , but that's not true. Perhaps any variable like this should be "unwound" and removed from the compiler state if a compile error is triggered?

So while compiling we'd flag variables as "not defined", then then if we hit a compiler error we'd remove those variables - as if the compiler had never seen them. But does that still leave the larger problem that compiling code with errors can still leave the VM in an "unknown state"?

joshgoebel commented 3 years ago

Example, this is surprising behavior:

> class Animal { } typo
[repl line 1] Error at 'typo': Expect end of file.
> class Animal {}
[repl line 1] Error at 'Animal': Module variable is already defined.

This seems a bit more understandable though (if it was documented).

ChayimFriedman2 commented 3 years ago

I agree, the question is what's the overhead.

joshgoebel commented 3 years ago

I'm guessing it's non-trivial, but the "backing out undefined variables" thing might not be so difficult since we already know what they are since we warn about them... so maybe this is two discrete issues and one is easy and one is hard.

DerThorsten commented 2 years ago

This issue makes it really hard to use wren in a REPL session. In particular using wren from jupyter suffers a lot from this issue. feel free to use wren in jupyterlite and see how much this issue is limiting the REPL workflow https://jupyterlite.github.io/demo/lab?path=xeus-wren%2Fiwren.ipynb