stan-dev / stanc3

The Stan transpiler (from Stan to C++ and beyond).
BSD 3-Clause "New" or "Revised" License
138 stars 44 forks source link

[BUG] Bogus parsing error with Stanc3 JS >= 2.34 - but only with QuickJS #1425

Closed andrjohns closed 1 month ago

andrjohns commented 1 month ago

This is a bit of an odd one but for some reason the Stanc3 JS is throwing a false error for too many indexes, but only with QuickJS:

tempjs <- tempfile(fileext = ".js")
download.file("https://github.com/stan-dev/stanc3/releases/download/v2.35.0-rc1/stanc.js",
              destfile = tempjs)

qjs_ctx <- QuickJSR::JSContext$new(stack_size = 0)
qjs_ctx$source(tempjs)

v8_ctx <- V8::v8()
v8_ctx$source(tempjs)

code <- " functions {
  real test(real x) {

    matrix[1,1] T_i;
    T_i[1,1] = 1;

    return 0;
  }
}"

qjs_res <- qjs_ctx$call("stanc", "model_name", code)
cat(qjs_res$errors)
#> 0 Semantic error in 'string', line 5, column 4 to column 12:
#>    -------------------------------------------------
#>      3:    
#>      4:      matrix[1,1] T_i;
#>      5:      T_i[1,1] = 1;
#>              ^
#>      6:      
#>      7:      return 0;
#>    -------------------------------------------------
#> 
#> Too many indexes, expression dimensions=2, indexes found=2.
v8_res <- v8_ctx$call("stanc", "model_name", code)
cat(v8_res$errors)
#>

The same error is present with 2.34, but not with 2.33:

tempjs <- tempfile(fileext = ".js")
download.file("https://github.com/stan-dev/stanc3/releases/download/v2.33.0/stanc.js",
              destfile = tempjs)

qjs_ctx <- QuickJSR::JSContext$new(stack_size = 0)
qjs_ctx$source(tempjs)

v8_ctx <- V8::v8()
v8_ctx$source(tempjs)

code <- " functions {
  real test(real x) {

    matrix[1,1] T_i;
    T_i[1,1] = 1;

    return 0;
  }
}"

qjs_res <- qjs_ctx$call("stanc", "model_name", code)
cat(qjs_res$errors)
#>

v8_res <- v8_ctx$call("stanc", "model_name", code)
cat(v8_res$errors)
#>

Any ideas what related code might have changed between 2.33 -> 2.34 in this area?

WardBrian commented 1 month ago

The most relevant change to the JS parser in 2.33->2.34 was updating OCaml and js_of_OCaml by a few versions.

This is definitely an odd one, I'll try to trace through some things tomorrow

WardBrian commented 1 month ago

Hi @andrjohns -

I looked into the differences in the generated code before/after the update.

Previous code

```javascript {var type$0=type,idcs$0=idcs; for(;;) {if(caml_call1(Middle_UnsizedType[46],type$0)) var scalar=3,rowvec=7,vec=6; else var scalar=1,rowvec=4,vec=2; if(idcs$0) {var switch$0=0; if(typeof type$0 === "number") switch(type$0) {case 5: case 8: var switch$1=0; if(595786329 <= idcs$0[1]) {var _yO_=idcs$0[2]; if(_yO_) {if(595786329 <= _yO_[1]) {if(! _yO_[2]){switch$0 = 1;switch$1 = 1}} else if(! _yO_[2])return vec} else {switch$0 = 1;switch$1 = 1}} else {var _yP_=idcs$0[2],switch$2=0; if(_yP_) if(595786329 <= _yP_[1]) {if(_yP_[2])switch$2 = 1} else if(_yP_[2]) switch$2 = 1; else {switch$0 = 2;switch$1 = 1;switch$2 = 1} if(! switch$2)return rowvec} break; case 2: case 4: case 6: case 7: var switch$3=0; if(-306849112 === idcs$0[1] && ! idcs$0[2]) {switch$0 = 2;switch$3 = 1} if(! switch$3 && ! idcs$0[2])switch$0 = 1; break } else if(0 === type$0[0]) {var _yQ_=type$0[1]; if(595786329 <= idcs$0[1]) {var tl=idcs$0[2];return [0,aux(_yQ_,tl)]} var idcs$1=idcs$0[2],type$0=_yQ_,idcs$0=idcs$1; continue} switch(switch$0) {case 2:return scalar; case 1:return type$0; default: caml_call1(Core_kernel[29],cst_findme); return error$0 (not_indexable (loc,ut,caml_call1(Core_kernel_List[7],indices)))}} return type$0}} ```

Current

```javascript for(;;){ if(caml_call1(Middle_UnsizedType[46], type$0)) var scalar = 3, rowvec = 7, vec = 6; else var scalar = 1, rowvec = 4, vec = 2; if(! idcs$0) return type$0; a: { if(typeof type$0 === "number"){ b: { switch(type$0){ case 5: case 8: c: { if(595786329 > idcs$0[1]){ var _ev_ = idcs$0[2]; if(_ev_){ if(595786329 > _ev_[1]){if(_ev_[2]) break c; break;} if(_ev_[2]) break c; } return rowvec; } var _eu_ = idcs$0[2]; if(! _eu_) break b; if(595786329 <= _eu_[1]){if(! _eu_[2]) break b;} else if(! _eu_[2]) return vec; } break a; case 2: case 4: case 6: case 7: if(-306849112 === idcs$0[1] && ! idcs$0[2]) break; if(idcs$0[2]) break a; break b; default: break a; } return scalar; } return type$0; } if(0 === type$0[0]){ var type$1 = type$0[1]; if(595786329 <= idcs$0[1]){ var tl = idcs$0[2]; return [0, aux(type$1, tl)]; } var idcs$1 = idcs$0[2], type$0 = type$1, idcs$0 = idcs$1; continue; } } caml_call1(Core[31], cst_findme); var _et_ = caml_call1(Core_List[17], indices); return error(caml_call3(Frontend_Semantic_error[34], loc, ut, _et_)); } ```

Obviously not the easiest thing to read, but one thing that is clear is that the new code is using a lot of labeled break statements, which seem to be due to https://github.com/ocsigen/js_of_ocaml/pull/1496.

However, it sounds like quickjs has some known bugs in this area: https://github.com/bellard/quickjs/issues/275 https://github.com/bellard/quickjs/issues/282

It seems highly likely to me that the generated code is hitting an edge case of quickjs' runtime. I am not sure of anything we can do on the stanc3 side

andrjohns commented 1 month ago

Ah yep, that was it! If I rollback js_of_ocaml to the release prior (5.4.0), all works without issue.

I'll open a PR dropping the version down, it looks like there's not a huge difference in file sizes between the two at least:

-r--r--r--@ 1 andrew  staff  1718231 May 21 19:35 stancjs_540.bc.js
-r--r--r--@ 1 andrew  staff  1572772 May 21 19:33 stancjs_551.bc.js