rescript-lang / rescript-compiler

The compiler for ReScript.
https://rescript-lang.org
Other
6.69k stars 446 forks source link

Change underlying rep of records to be plain javascript objects #2922

Closed Risto-Stevcev closed 4 years ago

Risto-Stevcev commented 6 years ago

I know this would be a major change. It seems like it makes much more sense for records to be represented as plain javascript objects instead of arrays. Ie:

type foo = { bar: int; baz: string }
Js.log { bar = 123; baz = "hello" }

prints this in the console:

{ bar: 123, baz: "hello" }

The array-like version could still be supported as something like foo Js.ArrayLike.t or something like that

This would avoid the opaque type right now which can't be destructured, or the converters to/from records, which can be expensive, especially since most people will be dealing with JS objects from 3rd party libs, AJAX, etc.

@bobzhang @chenglou

anmonteiro commented 6 years ago

bs.deriving abstract "records" are represented as JS objects at runtime. Doesn't that solve the problem for you?

Risto-Stevcev commented 6 years ago

No, because you can't destructure an abstract type. You lose all of that nice functionality you get with records or ES6 destructuring. You can also do foo.bar on records but not on abstract types

wegry commented 6 years ago

As someone who has attempted to build recently bindings for JS from bucklescript, I think this would reduce quite a bit of pain on the developer side. Deriving abstract and the resulting construction and usage of the types is syntactic salt.

aptfx commented 6 years ago

I think this would be a bad decision. Not because arrays are more appropriate, but because it would unnecessarily constrain internal representation of records to JS objects. This makes optimization worse. BS current representation is one of the reasons, that records are so efficient. If you want to interop you already can allocate JS obj types which are represented as JS objects

Risto-Stevcev commented 6 years ago

@aptfx Can you provide a concrete example of how much efficiency you would gain from the current implementation?

There might be some special cases where it's noticeably better, but in general I don't think the current version provides any noticeable perf gains for the vast majority of frontend apps. I don't think lists have any significant perf benefit over arrays either, modern web browsers are pretty good at optimizing arrays and objects.

Note that this wouldn't get rid of the efficient version, just that the default would change. The efficient version would still be available via something similar to how you have 'a Js.t and [%obj someRecord], like: 'a Js.FastRecord.t. and [%fastrec someRecord].

aptfx commented 6 years ago

https://github.com/neonsquare/bucklescript-benchmark/blob/master/README.md

Including the reasoning why hiding the representation is a big plus and makes Bucklescript better than Typescript in this regard. To abstract away representation allows for much more implementation freedom and is the really big chance for a statically typed compiler like Bucklescript to give better results in important ways. To throw this out by desires to hide small inconveniences of FFI interop (which are already solved using the ffi!) is short sighted.

Risto-Stevcev commented 6 years ago

Ok, so I ran those benchmarks and I included immutable js as well, which uses a trie internally for efficient updates:

Timings:

Reason: using BuckleScript Records/Lists : 1306.193ms
JS:     using Object.assign              : 13227.534ms

Timings of unfair/cheating variants
JS: using Object and manual key mapping (brittle code!)         : 827.233ms
JS: using Object mutation (no immutability!)                    : 533.067ms
JS: using immutablejs                    : 29448.573ms

Oddly enough, immutablejs is the slowest one, yet it was written specifically to be performant, otherwise you could just use concat or assign. This benchmark is biased toward object construction, so we never see those perf gains from immutablejs. So it's not the greatest benchmark.

I dug a little deeper. What happens if I make this code more realistic? In a realistic situation you would run friends once:

Timings:

Reason: using BuckleScript Records/Lists : 1.754ms
JS:     using Object.assign              : 0.633ms

Timings of unfair/cheating variants
JS: using Object and manual key mapping (brittle code!)         : 0.513ms
JS: using Object mutation (no immutability!)                    : 0.407ms
JS: using immutablejs                    : 8.126ms

Interesting. Object.assign is faster than the reason implementation for normal use cases! In fact all of them except immutablejs are substantially faster. But does it really matter which one of these you choose? you won't notice an 8ms difference as an end user.

Though you would never do this in a real app, lets try 10 times:

Timings:

Reason: using BuckleScript Records/Lists : 2.040ms
JS:     using Object.assign              : 1.367ms

Timings of unfair/cheating variants
JS: using Object and manual key mapping (brittle code!)         : 0.778ms
JS: using Object mutation (no immutability!)                    : 0.344ms
JS: using immutablejs                    : 10.754ms

Still faster.

So this illustrates my point pretty well. What's the point of having this around? It's not a small inconvenience, it's a big inconvencience: you're giving up record syntax, destructuring, etc. It's not short-sighted, readability matters. Maintainability matters. That's the whole point of why we're writing code in FP -- it's higher level than OO programming which exposes things like pointers and mutation, and both FP and OO are higher level than something like web assembly which requires manual memory management.

And as I said earlier, the more 'performant' version can still stick around in another form. And also lets be real, you would never do anything like this in a real app, but lets say for some weird reason you end up having to do something like this 1,000,000 times. Would you use this for that? or would it make more sense to reach for something like web assembly? Premature optimization is the root of all evil.

aptfx commented 6 years ago

You still don’t get the point. This is not a question about array vs JS object. It is about leaking record representation vs hiding record representation. When browsers performance patterns change (object spread got optimized in newer V8 for example) then BS can profit from it.

Or in short: if you want a JS obj why don’t you just allocate one? It IS possible in BS today!

Risto-Stevcev commented 6 years ago

You said that the inconvenience is small to you, and you want to hide the representation. It sounds like we're saying the same thing: that the current record implementation belongs more as an abstract type.

aptfx commented 6 years ago

There is no inconvenience in “using internal BS record structures as JS obj” because this is just always the wrong thing to do. If you want a JS obj then allocate one. It’s just as easy as that.

What you want is the equivalent of defining rust structures to always use c structure semantics and not only when explicitly asked for. It may be more convenient to you but it hinders flexibility in language implementation. Bucklescript is not just a JavaScript transpiler.

cemerick commented 5 years ago

FWIW, I agree entirely that bsc shouldn't make any claims about how records are laid out in the generated JavaScript. A much better argument is that an object-based record layout is nontrivially faster than the current array-based layout, and probably will remain so for a long while to come.

I say this based on this benchmark, which runs in ~9.5s under node. A dedicated Javascript implementation of the same workload completes in ~5.5s. (See this and other results here.)

If I manually alter the representation of the Rectangle record in the bsc-generated JavaScript (as here), then it is effectively as fast as the "native" JavaScript baseline, ~5.5s.

A few things:

  1. This representation tradeoff doesn't appear to be particularly VM-specific. Spidermonkey exhibits the same behaviour, and similar timings: ~5s for both JavaScript and object-based bsc output, ~11s for the standard array-based bsc output.
  2. If this change were made, the generated object field names should absolutely not be meaningful; in fact, they should be gensymed to be globally unique (i.e. a field a on record Foo should have a name distinct from a field also named a on record Bar). This enables JITs that use "HiddenClass"-style tracing and code generation to more easily distinguish disjoint runtime types (as described in some detail here). (@jordwalke actually makes this point here FWIW)
chenglou commented 5 years ago

@cemerick sure nice to see you around bucklescript =)

If we ever compile to js obj, we’ll compile to unmangled obj fields. Mangled names might fare better on benchmarks and in certain real-world situations, but we care about how 80% of userland code is written; if we look at those codebases, we’d see lots of record-to-Js.t object conversion functions, many of them inside hot reusable functions or hidden in macros. That defeats the purpose of our micro optimization by arguably a wide margin.

It might confuse the JIT occasionally, but I think compiling to the same field names as a small luxury we can afford when the entirety of BS is already this fast. Not to mention interop ergonomics.

yawaramin commented 4 years ago

Looks like done in #3932

TheSpyder commented 4 years ago

yeah, this can probably be closed now