ocsigen / js_of_ocaml

Compiler from OCaml to Javascript.
http://ocsigen.org/js_of_ocaml/
Other
961 stars 188 forks source link

RFC: ecmascript modules as primary build artifact #1161

Open cdaringe opened 3 years ago

cdaringe commented 3 years ago

context

JSOO has long been the sole, defacto provider mapping OCaml input to JavaScript output. This project has existed for well over a decade, and has the primary function of taking users' ML and converting it to immediately invocable or effectful JavaScript. This continues to be a great accomplishment.

Over the passed few years, the ECMAScript spec has evolved and developed a proper module system.

This module system development is of interest, because its stabilization enables a greater opportunity for ML to participate in the JavaScript ecosystem.

proposal

Provide a first class compile target to ESM.

justification

hypothetical

The following hypothetical cases may be completely bogus. Consider these my temporary, envisioned target state, even if they are not necessarily achievable.

case - empty

Input:

(* main.ml *)
(* no content *)

Output:

// main.js
// no content

No source, no dist! 75k worth of savings, vs status quo :)

case - hello world

Input:

(* main.ml *)
let () = print_endline "Hello, world!"

Output (a):

// main.js
console.log("Hello, world!")

Output (b):

// main.js
import { print_endline } from "@jsoo/std";

print_endline("Hello, world!")

case - reduce

Input:

(* main.ml *)
let rec sum = function
  | [] -> 0
  | head::tail -> head + (sum tail)

Output:

// main.js
import { fn, match_case, match } from "@jsoo/fn";
import { pattern_length } from "@jsoo/lists";

export const sum = fn(
  match_case(match(pattern_length(0, true)), () => 0),
  match_case(match(pattern_length(1, false), [head, ...tail] => head + (sum tail)))
);
// ^or whatever the equivalent output would need to be,
// as this is just pseudo-code

Where the runtime is partitioned into ESM modules as well. E.g.:

// @jsoo/caml_runtime
export const caml_lists_iter_whatever(a,b,c) => { /*  */ };
// ... all the caml_ stuff!

// @jsoo/lists
export const pattern_length = (len, exact) => x => exact 
  ? x.length === len 
  : x.length >= len;

// @jsoo/fn
export const match = x => pattern => pattern(x);

// @jsoo/operators
export const caml_neg_float = x => (-x);
// ...

If may be the case that the emitted modules is more along the lines of:

// main.js
import { caml_apply, caml_match_case, caml_match, caml_pat } from "@jsoo/runtime";
// ^ psuedocode, clearly. not an expert in the caml_ bindings, or the
// feasibility of such a mapping :)

export const sum = caml_apply(
  caml_match_case(caml_match(caml_pat(0, true)), () => 0),
  caml_match_case(caml_match(caml_pat(1, false), [head, ...tail] => head + (sum tail)))
);

And the fully runtime is implemented by a plain, super ES module.

What's nice about this, is that only the bare minimum import graph gets used, versus a full runtime!

omissions

Omitted from this discussion are

We could crack into all of these as interested!

user experience

I didn't see such conversations on this topic in github, but may have missed it. Sorry if this is duplicated! If this is duped conversation, please feel free to eagerly close!

hhugo commented 3 years ago

Thanks for taking the time to write all this.

I want to clarify one aspect for people reading this RFC

the full runtime is compiled, even if only a subset is used

What's nice about this, is that only the bare minimum import graph gets used, versus a full runtime!

js_of_ocaml is designed to only include the part of the runtime that it needs. It does not blindly including the whole runtime for not reason

$ cat test.ml 
let () = print_endline "Hello, world!"
$ ocamlc test.ml -o test.bc
$ js_of_ocaml test.bc
$ ls -lh test.js 
20K test.js

One of the reason for the size of the included runtime is that we try hard to keep the same semantic as regular OCaml. For example, print_endline is not translated to plain console.log, instead the output will be buffered, and flushed when needed, similar to regular OCaml

cdaringe commented 3 years ago

Thanks for the clarification, I’ll have to go back and revisit why the artifact I produced was so far off.

hhugo commented 3 years ago

There are two ways to compile with jsoo:

Separate compilation (probably what you used), it is the default when using dune. libraries and modules are compiled individually to javascript and then linked together. In that mode there is no deadcode elimination and the runtime is included in full. This mode should be used during development because of the short feedback loop.

Whole program compilation, used in dune when --profile release. In that mode, jsoo does deadcode elimination and only include the part of the runtime it needs. Depending on size of the dependencies, compilation can easily take 10s or more.

cdaringe commented 3 years ago

I discovered that any consumption of the Js module is that which adds approx 50kb+ of javascript

$ ocamlfind ocamlc -package js_of_ocaml -package js_of_ocaml-ppx -linkpkg  js/main.ml -o js/main.bc
$ js_of_ocaml compile --opt=3 js/main.bc
$ ls -h js/

Tried various other optimization flags to js_of_ocaml without getting under 70k w/ the JS module. More investigation needed to better understand why that mod is so problematic in the rendered output.

Nonetheless, I still lobby that yielding ESM would be an improved artifact to offer users :)

hhugo commented 3 years ago

The Js module usesPrintexc that in turn Printf. Here is the sizes of stdlib modules compiled to js. Note that jsoo does deadcode elimination and that using a module doesn't mean including all of it. However, CamlinternalFormat contains a large number of mutually recursive functions, which can explain why deadcode elimination doesn't do a good job on it.

js_of_ocaml ~/.opam/4.12.0/lib/ocaml/stdlib.cma  --keep-unit-names -o .
$ wc -c ./*.js | sort -n -r
273406 total
 54197 ./CamlinternalFormat.js
 19626 ./Stdlib__scanf.js
 16547 ./Stdlib__format.js
 11169 ./Stdlib__ephemeron.js
 10577 ./Stdlib__list.js
  8916 ./Stdlib__hashtbl.js
  8870 ./CamlinternalOO.js
  8753 ./Stdlib__set.js
  8318 ./Stdlib__arg.js
  8242 ./Stdlib__map.js
  8139 ./Stdlib__float.js
  7911 ./Stdlib__bytes.js
  7693 ./Stdlib__filename.js
  6989 ./Stdlib__array.js
  6367 ./Stdlib__printexc.js
  6357 ./Stdlib__buffer.js
  6297 ./Stdlib.js
  5707 ./Stdlib__bigarray.js
  5549 ./Stdlib__weak.js
  5233 ./Stdlib__genlex.js
  4269 ./Stdlib__string.js
  4159 ./Stdlib__stream.js
  3335 ./Stdlib__random.js
  2981 ./Stdlib__gc.js
  2795 ./Stdlib__lexing.js
  2420 ./Stdlib__parsing.js
  2332 ./Stdlib__obj.js
  2274 ./CamlinternalFormatBasics.js
  1719 ./Stdlib__queue.js
  1619 ./Stdlib__seq.js
  1604 ./Stdlib__digest.js
  1483 ./Stdlib__int64.js
  1459 ./Stdlib__complex.js
  1377 ./Stdlib__result.js
  1212 ./Stdlib__stack.js
  1206 ./Stdlib__char.js
  1175 ./Stdlib__uchar.js
  1146 ./Stdlib__int32.js
  1110 ./Stdlib__printf.js
  1077 ./Stdlib__sys.js
  1006 ./Stdlib__option.js
   973 ./Stdlib__nativeint.js
   949 ./Stdlib__either.js
   903 ./Stdlib__marshal.js
   883 ./Stdlib__fun.js
   793 ./Stdlib__pervasives.js
   727 ./CamlinternalLazy.js
   612 ./Stdlib__bytesLabels.js
   538 ./Stdlib__listLabels.js
   449 ./Stdlib__lazy.js
   430 ./CamlinternalAtomic.js
   398 ./Stdlib__stringLabels.js
   366 ./Stdlib__arrayLabels.js
   358 ./Stdlib__int.js
   352 ./Stdlib__bool.js
   279 ./Stdlib__callback.js
   238 ./Stdlib__unit.js
   217 ./Stdlib__atomic.js
   208 ./Stdlib__moreLabels.js
   199 ./CamlinternalMod.js
   185 ./Stdlib__oo.js
   134 ./Stdlib__stdLabels.js

As you can see things related to format are rather big.

hhugo commented 2 years ago

related to https://github.com/ocsigen/js_of_ocaml/issues/551