ocaml-ppx / ppx_deriving_yojson

A Yojson codec generator for OCaml.
MIT License
155 stars 46 forks source link

"Stack overflow" exception in JavaScript execution mode (with Js_of_ocaml) #138

Open pbaudin opened 3 years ago

pbaudin commented 3 years ago

Within the JavaScript mode the compilation of the recursive functions is not optimized (c.f. http://ocsigen.org/js_of_ocaml/3.7.0/manual/tailcall).

There is the Javascript translation of your Ppx_deriving_yojson_runtime.map_bind function :

 function map_bind(f,acc,xs)
     {if(xs)
       {var
         xs$0=xs[2],
         x=xs[1],
         _b_=function(x){return map_bind(f,[0,x,acc],xs$0)};
        return symbol_bind(caml_call1(f,x),_b_)}
      return [0,caml_call1(Stdlib_list[9],acc)]}

So, the conversion of a Yojson.Safe.t data into a long list of elements (~10.000) could raise a Stack overflow exception.

Notice that kind of implementation (without acc) solves the problem:

let save_map_bind f xs =
  let exception Error_map_bind of string in
  try
    Result.Ok (List.rev (List.rev_map
                                     (fun x -> match f x with
                                      | Result.Ok x -> x
                                      | Result.Error s -> raise (Error_map_bind s)) xs))
  with Error_map_bind s -> Result.Error s

Of course, the proposed implementation can certainly be improved since the type of that save_map_bind function is less generic than your map_bind function (but it seems to me that the proposed implementation is compliant with the .mli interface except about the removed accumulator).

gasche commented 3 years ago

I believe that exception-raising is also fairly slow with js-of-ocaml, so I suspect that the version you propose could be a noticeable performance regression for small lists.

pbaudin commented 3 years ago

The performance regression mainly depends on the number of lists. Of course, if these lists are small it is more noticeable.

I looked at this implementation :

     let rec map_bind f acc xs =
       match xs with
       | x :: xs -> (match f x with
           | ((Result.Error _) as x) -> x
           | Result.Ok x -> map_bind f (x :: acc) xs)
       | [] -> Result.Ok (List.rev acc)

It seems to be ok with my example (no stack overflow), but I wasn't able to extract its javascript translation since the debugger shows me only my Ocaml source code ! So, I can't confirm that is the solution.

gasche commented 3 years ago

Yes; what you did is to inline the implementation of >>=, and I believe that the result is using a form of tail-recursion that is properly supported by js_of_ocaml. Would you like to propose a PR with this new implementation?

Note: what we learned here is that common monadic-programming idioms (recursive functions with bindings inside) are not properly supported by js_of_ocaml. I wondered if this may be a problem as well for the code generated by ppx_deriving_yojson. I inspected the uses of >>= in the codebase, and it appears that they are all in fact use-cases in which map and pair would suffice (in other words, we are only using the applicative structure of the monad), because they are of the form <foo> >>= (fun x -> <bar> >>= fun y -> Result.Ok (... x ... y ... )). This means that there may be no problem in practice, and that it should be possible to rewrite the code generator to guarantee that there is no problem -- if instead we wrote the code above like map (pair <foo> <bar>) (fun (x, y) -> Result.Ok (... x ... y ...), we would know for sure that the recursive calls are never "hidden" inside a closure like they are today. Let me know if you would be interested in contributing this change, I'm happy to provide more explanations.

pbaudin commented 3 years ago

I moved my latest implementation in a part where I can see the javascript translation :

    function safe_map_bind(f,acc,xs)
     {var acc$0=acc,xs$0=xs;
      for(;;)
       {if(xs$0)
         {var xs$1=xs$0[2],x=xs$0[1],x$0=caml_call1(f,x);
          if(0 === x$0[0])
           {var x$1=x$0[1],acc$1=[0,x$1,acc$0],acc$0=acc$1,xs$0=xs$1;continue}
          return x$0}
        return [0,caml_call1(Base_List[35],acc$0)]}}

That looks safe since there is no recursion into the generated code! I do hope you will modify your Ppx_deriving_yojson_runtime.ml in that way. Thanks.

gasche commented 3 years ago

I do hope you will modify your Ppx_deriving_yojson_runtime.ml in that way.

I asked above if you would be willing to send a Pull Request to implement the change. (Is this an implicit way to decline the invitation?). I think that a PR is nicer than a maintainer doing the change, in particular as it directly tracks authorship of the contribution.

pbaudin commented 3 years ago

I would have greatly preferred to let the authorship of that inline to the compiler!