nginx / njs

A subset of JavaScript language to use in nginx
http://nginx.org/en/docs/njs/
BSD 2-Clause "Simplified" License
1.15k stars 150 forks source link

block scoped function definitions support. #134

Closed drsm closed 5 years ago

drsm commented 5 years ago

overview function definitions are block scoped:

root@node:~# cat test.mjs 

function test() {
    console.log('top');
}

{
    setImmediate(test);
    function test() {
        console.log('block1');
    }
}

{
    function test() {
        console.log('block2');
    }
    setImmediate(test);

    {
        function test() {
            console.log('block3');
        }
        setImmediate(test);
    }
}

test();
root@node:~# node --experimental-modules test.mjs 
(node:4894) ExperimentalWarning: The ESM module loader is experimental.
top
block1
block2
block3

only top or block level definitions are allowed:

root@node:~# cat test1.mjs 
if (true) function test() {}
root@node:~# node --experimental-modules test1.mjs 
(node:4976) ExperimentalWarning: The ESM module loader is experimental.
file:///root/test1.mjs:1
if (true) function test() {}
          ^^^^^^^^

SyntaxError: In strict mode code, functions can only be declared at top level or inside a block.
    at translators.set (internal/modules/esm/translators.js:51:18)

and some weird resolution rules:

root@node:~# cat test2.mjs 
// At the top level of a function, or script, 
// function declarations are treated like var declarations
// rather than like lexical declarations.
function fblock() {
    var test;
    function test() {
    }
}

// But not at the top level of a module!
function test() {}
var test;
root@node:~# node --experimental-modules test2.mjs 
(node:5018) ExperimentalWarning: The ESM module loader is experimental.
file:///root/test2.mjs:12
var test;
    ^

SyntaxError: Identifier 'test' has already been declared
    at translators.set (internal/modules/esm/translators.js:51:18)
drsm commented 5 years ago

@xeioex @hongzhidao

only top or block level definitions are allowed:

root@node:~# cat test1.mjs 
if (true) function test() {}
root@node:~# node --experimental-modules test1.mjs 
(node:4976) ExperimentalWarning: The ESM module loader is experimental.
file:///root/test1.mjs:1
if (true) function test() {}
          ^^^^^^^^

SyntaxError: In strict mode code, functions can only be declared at top level or inside a block.
    at translators.set (internal/modules/esm/translators.js:51:18)

this part is unrelated to lexical scoping. actually it is a bug.

hongzhidao commented 5 years ago

@drsm

My understand.

  1. functions can only be declared at top level or inside a block.
  2. functions scope is at where it declared.
  3. with if (1) x, there is no scope inside if.

Right?

hongzhidao commented 5 years ago

deleted.

drsm commented 5 years ago

There are four cases:

  1. function definition at top level of module.
  2. function definition at top level of another function.
  3. function definition inside a block.
  4. function definition at top level of script.
    // 1. or 4. depends on vm->options.module
    function f() {}
    function a() {
    // 2.
    function f() {}
    }
    function b() {
    {
        // 3.
        function f() {}
    }
    }

    Cases 1. and 3. are identical. I use block in tests because node's REPL doesn't support evaluation as a module.

    • the definition behave like lexically declared variable (let f)
      root@node:~# node --use-strict
      > ;{ function f() {} function f() {} }
      SyntaxError: Identifier 'f' has already been declared
      > ;{ function f() { return 1; } { function f() { return 2; } } f(); }
      1
      > ;{ { function f() { return 1; } f(); } function f() { return 2; } }
      1
      > ;{ var f; function f() {} }
      SyntaxError: Identifier 'f' has already been declared
      > ;{ let f; function f() {} }
      SyntaxError: Identifier 'f' has already been declared
      > ;{ function f() {} f = 1; f}
      1
    • the definition is hoisted to the top of the block and initialized there. so there is no TDZ.
      > ;{ typeof f; function f() {} }
      'function'

The cases 2. and 4. are identical.

We have to change our module wrapper from

function() {
/* module code */
}

to

() => {
    if (true) {
        /* module code */
    }
}

to support the feature.

hongzhidao commented 5 years ago

@drsm thanks.

I rearranged the rules you said.

  1. There are three cases: top scope, function scope, and block scope.
  2. function declaration should behave like let in block scope.
drsm commented 5 years ago

@hongzhidao

see #115 There are two modes of execution of a source file:

They have different semantics. A file which import's anything is considered a module. So, while the main file runs in top scope (this need to be fixed, but not here), it doesn't matter, the rules for modules are applied.

So,

2. function declaration should behave like let (except TDZ) in block scope and top scope.

root@node:~# eshost -m -x 'function f() {} function f() {}'
#### ch

#### jsc

#### sm

SyntaxError: redeclaration of function f:

#### v8

SyntaxError: Identifier 'f' has already been declared

#### xs

SyntaxError: duplicate variable f
hongzhidao commented 5 years ago

@drsm

  1. Now, the module is different from the script in NJS. the scope of the script is GLOBAL, the scope of the module is FUNCTION. Do you think it's reasonable?

  2. Which CLI comply with ES6? It seems nodesj and chrome/v8 do not behave 100% with ES6.

  3. function declaration should behave like let (except TDZ) in block scope and top scope.

It's not hard to do it, but I need to deep into test262 first suggested by @xeioex . There are too details while implementing these grammers.

drsm commented 5 years ago

Now, the module is different from the script in NJS. the scope of the script is GLOBAL, the scope of the module is FUNCTION. Do you think it's reasonable?

I think, this will be fixed while doing "imports are readonly views of exports" feature. Then every file will have its own MODULE scope, and GLOBAL will not be directly accessible (except a hack new Function('return this')). Also, for now, if we change our module wrapper as suggested above, the scope of the module will be BLOCK.

  1. Which CLI comply with ES6? It seems nodesj and chrome/v8 do not behave 100% with ES6.

I use node v11.14 with --experimental-modules file.mjs options (mjs extension is required) to run files, and eshost -m -x 'code' as suggested here. Did not find anything with full-featured REPL which supports module execution mode.

function declaration should behave like let (except TDZ) in block scope and top scope.

It's not hard to do it, but I need to deep into test262 first suggested by @xeioex . There are too details while implementing these grammers.

Yes, there is a lot of "do not break the web" legacy stuff in the spec.

hongzhidao commented 5 years ago

@xeioex @drsm

  1. Functions can only be declared at top level or inside a block.
  2. function declaration should behave like let (except TDZ) in block scope.
  3. Block scoped function definitions support. https://gist.github.com/hongzhidao/0504d557c82b82daf66b890d09e2883a

function declaration should behave like let (except TDZ) in block scope and top scope.

BTW. I didn't restrict top scope (easy to add), but are you sure? @drsm

drsm commented 5 years ago

@hongzhidao

thanks, works fine with blocks!

BTW. I didn't restrict top scope (easy to add), but are you sure? @drsm

OK

Left:

hongzhidao commented 5 years ago

@drsm @xeioex

Improved function declaration.

  1. Functions can only be declared at top level or inside a block.
  2. Function declaration should behave like let (except TDZ) in block scope, global scope and module scope.
  3. Block scoped function definitions support. https://gist.github.com/hongzhidao/0504d557c82b82daf66b890d09e2883a

function declaration should behave like let (except TDZ) in block scope and top scope.

What it behave in accumulate mode?

xeioex commented 5 years ago

The following test still fails: https://gist.github.com/xeioex/060a5f175fadc9b7e44ec9f2dc641f58

build/njs -d -q test.js 
ReferenceError: "f" is not defined in 114

ReferenceError is thrown during compilation, but that looks like a separate feature to do.

hongzhidao commented 5 years ago

@xeioex

ReferenceError is thrown during compilation, but that looks like a separate feature to do.

Yes. I think we need to finish scope feature first suggested by @drsm. Now it's OK and easily to add let. With let, the hard thing is to support for let, I have no clear idea yet, others seem ready.

drsm commented 5 years ago

What it behave in accumulate mode?

Good question! Maybe introduce some permissive REPL mode where reinitialization of top-level lexically declared vars is allowed? To prevent situations like this:

$ node --use-strict
> function fail() { throw null; }
undefined
> let x = fail();
Thrown: null
> x
ReferenceError: x is not defined
> x = 1
ReferenceError: x is not defined
> // because of TDZ
undefined
> let x = 2;
SyntaxError: Identifier 'x' has already been declared
> // Oops...
xeioex commented 5 years ago

@hongzhidao

https://gist.github.com/hongzhidao/0504d557c82b82daf66b890d09e2883a rev 3

 language/function-code/S10.4_A1.1_T1 passed in strict mode 
 === language/function-code/block-decl-onlystrict failed in strict mode === 
 --- errors ---  
- TypeError: cannot get property "constructor" of undefined
-    at anonymous (:52)
-    at main (native)===
+ ReferenceError: "f" is not defined in 115===
 === language/function-code/each-param-has-own-non-shared-eval-scope failed in strict mode === 
 --- errors ---  
  SyntaxError: Unexpected token "=" in 112===
@@ -63382,10 +63380,7 @@
 language/global-code/S10.1.7_A1_T1 passed in strict mode 
 === language/global-code/block-decl-strict failed in strict mode === 
 --- errors ---  
- Test262Error: Expected a ReferenceError to be thrown but no exception was thrown at all
-    at anonymous (:22)
-    at anonymous (:82)
-    at main (native)===
+ ReferenceError: "f" is not defined in 112===

https://gist.github.com/hongzhidao/0504d557c82b82daf66b890d09e2883a rev4

this patch breaks a bunch of tests (as well as harness tests).

https://gist.github.com/xeioex/422cca01a8611e627001806949f04c2d

In order not to break them vm->option.module should be considered, I guess.

In unit tests, I suggest to add module_unit_tests (like njs_timezone_optional_test in https://github.com/nginx/njs/blob/master/njs/test/njs_unit_test.c#L13277) for unit tests in module mode.

drsm commented 5 years ago

this patch breaks a bunch of tests (as well as harness tests).

looks like they are regenerating their error subclass every time in the global scope.

diff --git a/harness/sta.js b/harness/sta.js
index 9083b38f36..cece870c9a 100644
--- a/harness/sta.js
+++ b/harness/sta.js
@@ -9,7 +9,7 @@ description: |
 ---*/

-function Test262Error(message) {
+var Test262Error = function Test262Error(message) {
   this.message = message || "";
 }

@xeioex I forget how to run it, please help

xeioex commented 5 years ago

@drsm

The updated instruction is here: https://gist.github.com/xeioex/3cd6b4a5bd4ed856b753645b063ae282

xeioex commented 5 years ago

@drsm

looks like they are regenerating their error subclass every time in the global scope.

test262 includes ES5 tests, isn't for ES5 function redeclaration is OK? even in strict mode?

So, I am in favour both script (to increase the test coverage of test262) as well as module mode support. For njs in nginx module mode will be set to true in the near future.

drsm commented 5 years ago

@xeioex Thanks!

test262 includes ES5 tests, isn't for ES5 function redeclaration is OK? even in strict mode?

yes it is OK at top level. and it works very strange within blocks.

So, I am in favour both script (to increase the test coverage of test262) as well as module mode support.

Agree. I'm OK with the script mode at the top scope.

hongzhidao commented 5 years ago

@xeioex

  1. Can you show the steps you run for the patch? https://gist.github.com/hongzhidao/0504d557c82b82daf66b890d09e2883a I follow the instruction, but I don't know how to filter out the right information. https://gist.github.com/xeioex/3cd6b4a5bd4ed856b753645b063ae282

  2. What are the differences between script and module mode? In module mode, the script will behave like a module?

  3. What happens if a feature is a conflict between ES5 and ES6, do we choice ES6?

  4. Do you know what's the problem with the patch?

xeioex commented 5 years ago

I follow the instruction, but I don't know how to filter out the right information.

  1. use --command='<full path to njs cli binary> -q' to disable filenames in output of exceptions (randoms filenames are used by tests)
  2. you need to run test262 2 times before and after the test, to see the diff diff -u test262_njs_orig.log test262_njs_after.log
xeioex commented 5 years ago

What happens if a feature is a conflict between ES5 and ES6, do we choice ES6?

we need to do both, enabling ES6 only features only if vm->options.module is enabled.

What are the differences between script and module mode?

See this

Currently only undefined global this is implemented for vm->options.module.

drsm commented 5 years ago

4. Do you know what's the problem with the patch?

fixed The terms module and script there refer to the current execution mode defined by vm->options.module.

Like -m option does it in eshost:

# eshost -h v8 -m -x 'function f() {} function f() {}'
#### v8

SyntaxError: Identifier 'f' has already been declared

root@node:~# eshost -h v8 -x 'function f() {} function f() {}'
#### v8

vm->options.module effective only at top scope.

hongzhidao commented 5 years ago

@xeioex @drsm

In NJS, we make rules.

  1. The script file will turn into a module if setting vm->options.module.
  2. function declaration should behave like let (except TDZ) in block scope and module mode.

Right?

drsm commented 5 years ago

@hongzhidao Yes. And the setting applies to the top scope of a file which contains the entry point (the script file in your terms) only. Everything import'ed is executed in module mode as there is a block generated by the wrapper function.

hongzhidao commented 5 years ago

@xeioex @drsm

deleted.

BTW, I like the design of introducing module mode, it's clear.

So as I understand.

  1. It's allowed to declare duplicated function at the top in ES5, and it's illegal in ES6?
  2. module is introduced in ES6, and the duplicated function declarations are not allowed in the top of module?

So the rule is simplified as: function declaration should behave like let (except TDZ) in module mode. (including block scope of course)

Right?

hongzhidao commented 5 years ago

@xeioex @drsm

Here are the patches.

  1. Fixed unit tests.
  2. Restricted function declaration to top level or inside a block.
  3. Added block scoped function definitions support. https://gist.github.com/hongzhidao/4f4e902bfec4a44834f93aa77dbfb19d
drsm commented 5 years ago

@hongzhidao

declaration_exception.js is missing

Problem with lexically declared name:

;{ function f() {} { var f }}
hongzhidao commented 5 years ago

@hongzhidao

declaration_exception.js is missing

Problem with lexically declared name:

;{ function f() {} { var f }}

I updated it again, spliting it into separate patches :) https://gist.github.com/hongzhidao/4f4e902bfec4a44834f93aa77dbfb19d

Problem with lexically declared name:

;{ function f() {} { var f }}

Is it a problem? Get it, yes.

drsm commented 5 years ago

@hongzhidao

Is it a problem? (on it)

Yes. If we have something lex declared, it's forbidden to var declare anything in inner blocks also

hongzhidao commented 5 years ago

@drsm @xeioex

Updated. https://gist.github.com/hongzhidao/4f4e902bfec4a44834f93aa77dbfb19d

xeioex commented 5 years ago

@hongzhidao

njs_variable_t *
njs_variable_add(njs_vm_t *vm, njs_parser_scope_t *scope, nxt_str_t *name,
    uint32_t hash, njs_variable_type_t type)
{
    njs_variable_t      *var;
    nxt_lvlhsh_query_t  lhq;

    lhq.key_hash = hash;
    lhq.key = *name;
    lhq.proto = &njs_variables_hash_proto;

    var = njs_variable_scope_add(vm, scope, &lhq, type);
    if (nxt_slow_path(var == NULL)) {
        return NULL;
    }

    if (type == NJS_VARIABLE_VAR && scope->type == NJS_SCOPE_BLOCK) {
        /* A "var" declaration is stored in function or global scope. */
        do {
            scope = scope->parent;

            var = njs_variable_scope_add(vm, scope, &lhq, type);
            if (nxt_slow_path(var == NULL)) {
                return NULL;
            }

        } while (scope->type == NJS_SCOPE_BLOCK);
    }

    if (type == NJS_VARIABLE_FUNCTION) {
        var->type = type;
    }

    return var;
}

The code above inserts a variable into multiple scopes (into function and block scopes): (function() {{var f = 1; f+=1}})

Seems minor, but I am not sure whether it can cause any troubles later.

hongzhidao commented 5 years ago

@xeioex

The code above inserts a variable into multiple scopes (into function and block scopes)

I thought about it. Essentially we need two different things to store the added variable information: scope->declared_vars and scope->variables. scope-> declared_vars is required in each scope and will be used to check duplicated declarations as needed, scope->variables is kept the current way. Now we combine them together.

BTW, how do you get the result like below? the percentage of pass, for example 25.2%.

- - Passed 6785 tests (25.0%)
- - Failed 20334 tests (75.0%)
+ - Passed 6831 tests (25.2%)
+ - Failed 20288 tests (74.8%)
xeioex commented 5 years ago

@hongzhidao

BTW, how do you get the result like below? the percentage of pass, for example 25.2%.

you need --summary option.