major cleanup, performance improvement due to removal uf u16string usage, fixed copy-by-value to be copy-by-const-ref

apfeltee commented 2 years ago

The version (when i forked it) had some major issues, dominantly the unnecessary use of std::u16string, copying of values due to copy-by-value signatures (void foo(std::string) vs void foo(const std::string&)).

I also silenced most of the well over 500 warnings about unused variables, so it compiles much cleaner with -Wall -Wextra now.

And, I added some pretty rudimentary REPL support. it works, but it doesn't print the entered resulting value yet.
For commandline parsing, I added my own command line parser (optionparser.h).

I think this has potential, but there are definitely some things that need addressing in the long run:

Array indices are currently strings. This is a major performance bottleneck, and even clever optimizing will never be as good as plain integer indices. Seems the same is true for strings.
Use of global variables is going to make it difficult to make es thread-safe.
I'm not entirely sure what the idea behind the use if u16string is/was, but using u16string makes every string much larger than they usually need to be. Unicode is only needed for Value strings, but very definitely not in the parser, or most of the runtime.
Some manner of bytecode machine may be necessary to improve performance. This isn't a pressing issue, but implementing a JIT will inevitably also require bytecode representation.

For some mild benchmarking, take a look at the mandel*.js files. Even without optimization, they run much faster now.

I get that my changes are rather extensive, but I hope you see what I mean.

zhuzilin commented 2 years ago

Thank you for your PR! However, es is still in the very early stage of the development and major refactor happens on a daily basis. I'm afraid it is not ready for such a large PR at the moment. But still thank you so much for trying to make es better!

I also silenced most of the well over 500 warnings about unused variables, so it compiles much cleaner with -Wall -Wextra now.

Thank you for this suggestion. I didn't use -Wall -Wextra during development and I'll fix those errors gradually :)

Array indices are currently strings. This is a major performance bottleneck, and even clever optimizing will never be as good as plain integer indices. Seems the same is true for strings.

I agree. The "converting number index to string" was me simply following the ES5 spec. I'm considering refactoring the data structure of the properties according to the practive in V8, which will separate the string and integer indices.

Use of global variables is going to make it difficult to make es thread-safe.

As you can see from the code, there are still a bunch of optimization to be done before going multi-threading. So es will just be single thread at the moment. I'll come back to this design when multi-threading is needed.

I'm not entirely sure what the idea behind the use if u16string is/was, but using u16string makes every string much larger than they usually need to be . Unicode is only needed for Value strings, but very definitely not in the parser, or most of the runtime.

The ES5 spec explicitly specifies the source code as a stream of 16 bit characters:

ECMAScript source text is assumed to be a sequence of 16-bit code units for the purposes of this specification. Such a source text may include sequences of 16-bit code units that are not valid UTF-16 character encodings. If an actual source text is encoded in a form other than 16-bit code units it must be processed as if it was first convert to UTF-16.

This the reason why I used u16string everywhere in the code, and this enable es to do something like:

var 你好 = 1

Some manner of bytecode machine may be necessary to improve performance. This isn't a pressing issue, but implementing a JIT will inevitably also require bytecode representation.

I can't agree more! Design a simple and good enough bytecode was one of the reason why I start this project. After making the current AST interpreting version works, I'll start to think about the design. A very raw idea is to extract a subset from the register based bytecode from that of V8 and JSC. Any thought you'd like to share would be appreciated!

For some mild benchmarking, take a look at the mandel*.js files. Even without optimization, they run much faster now.

Currently I'm using Octane typescript test as the benchmark. And I'll definitely give the mandel*.js files a try.

Thank you again for your great suggestions!

apfeltee commented 2 years ago

I understand that the pull is a bit too much, but I appreciate you considering my recommendations.

The ES5 spec explicitly specifies the source code as a stream of 16 bit characters:

I didn't know that. From the point of view of building compilers/interpreters, this seems really odd. But then, it is javascript...
Anyway, even so, I believe this is possible without necessarily using u16string everywhere, as strings (be it const char*, std::string, or even std::vector<....>) are just linear arrays of bytes, and I think that at least in the heavy-duty steps of parsing (tokenizing/lexing) it might be possible to do so without using such a large string type.
(now I get why v8 is so large...)

A very raw idea is to extract a subset from the register based bytecode from that of V8 and JSC

A good idea, perhaps better than reinventing the wheel. Sadly, I don't have much experience when it comes to bytecode compilation...

I'm also closing this pull, because as you said, it's too big (and due to merge conflicts)

zhuzilin / es

major cleanup, performance improvement due to removal uf u16string usage, fixed copy-by-value to be copy-by-const-ref #2