Open indutny opened 6 years ago
Here is an example of what it might look like: https://github.com/indutny/llparse/blob/master/test/fixtures/example.js
If you're thinking of writing some lexer generator, I could definitely get behind that - I've had the same thought more than once - but why LLVM IR specifically and wouldn't we be reinventing ragel?
@bnoordhuis there're probably better solutions out there, but I always wanted to write something that compiles to LLVM IR. Is it a valid excuse?
On a more serious note, I'd like every state implementation to live in a separate procedure that tail-calls other procedures in the most of the cases. This might turn out to be faster than having a single grand dispatch.
Also on the feature list is having a low-level trie implementation that works seamlessly.
It could probably work in a way similar to Ragel, however it'd have to rely on LLVM to inline the C code into compiled nodes of the state machine graph.
You can't replace http-parser with something that spits out LLVM IR, that leaves too many users out in the cold. Something that compiles to C or has different back-ends could work though (but then you're half-way on the road to reimplementing ragel or re2c.)
a low-level trie implementation
What for?
We obviously need a trie-like state machine to replace hand-written state code for the methods.
Not sure I follow. For lexers, you compute the DFA or NFA and then emit state tables or goto-based code. I guess you could implement it as a trie but I don't know why you would.
I didn't mean in-memory trie, as a data-structure. Rather a compiled code consisting of branches and states linked in a trie-like structure.
Ah, okay. You get that for free with a DFA but I guess a trie is pretty much a DFA encoded as a tree.
I think I'm on the edge of giving up. Just figured out that musttail
may not work on arm.
Nvm, it works! 👍
Took few iterations, but I think I've reached some intermediate milestone here: https://github.com/indutny/llparse/blob/master/test/api-test.js
What do you think, @bnoordhuis ?
Here is what it produces: https://gist.github.com/indutny/ac9eedc036a43098b2ad0bf7b63e9f65
@bnoordhuis better example here: https://github.com/indutny/llparse/tree/master/examples/http
API-wise it looks real nice and the generated code looks tight apart from a hiccup:
$ make -C examples/http
node index.js > http.ll
cc -g3 -Os -flto -fvisibility=hidden -Wall http.ll main.c -o http
warning: overriding the module target triple with x86_64-apple-macosx10.13.0 [-Woverride-module]
1 warning generated.
cannot guarantee tail call due to mismatched parameter counts
%15 = musttail call fastcc i8* @http_parser__invoke_on_complete(%http_parser_state* %0, i8* %14, i8* %2)
cannot guarantee tail call due to mismatched parameter counts
%40 = musttail call fastcc i8* @http_parser__invoke_on_complete(%http_parser_state* %0, i8* %39, i8* %2)
cannot guarantee tail call due to mismatched parameter counts
%10 = musttail call fastcc i8* @http_parser__method(%http_parser_state* %0, i8* %1, i8* %2, i32 0)
LLVM ERROR: Broken module found, compilation aborted!
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [http] Error 1
$ cc -v
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin17.2.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Do you have plans for a C back-end?
The hiccup is due to a bug in -flto
, see: https://bugs.llvm.org/show_bug.cgi?id=36441 . I've just pushed a fix to a Makefile in that example.
Here is some real work on porting http parser: https://github.com/indutny/llhttp.
I don't really have plans for a C back-end yet, but will likely have to explore this eventually.
Note that despite not being present in the example, I've just introduced an API for creating callbacks using LLVM IR (instead of a reference to C function): https://github.com/indutny/llparse/blob/81578c8f41a926514a6625b2a9c9941218728408/test/fixtures/index.js#L85-L121
After I'll add an API to extend the state structure from the user code, these compiled code chunks will be able to update the state without ever calling the C-land. C-land calls are sort of expensive right now, as they can't be inlined (due to various machine-specific flags that has to be matched).
Additionally, I plan to introduce "mark"s that would work in the same way as they do in http_parser.c
@bnoordhuis and so we got spans (instead of "mark"s): https://github.com/indutny/llhttp/blob/38fb3aed8c7290aec389a109e2e8cd1c004872c4/lib/llhttp/url.js#L12 . They can be interleaved if we'd ever want to, and they should be efficient.
"Getting rid of huge switch statement that isn't optimized well"
Can the switches be permuted to get speedup on average inputs? Perhaps a utility that takes as input a URI corpus file and outputs optimal permutations for the switches?
@chadbrewbaker there's little point in this, switches like the one in http_parser are optimized into jump table.
I know it sounds crazy, but bear with me :wink:
What would you think about rewriting whole project using some JavaScript tool to compile a DSL to a LLVM IR?
I've several reasons for this:
It doesn't sound to complicated, and may finally make our codebase shine.
cc @bnoordhuis @mscdex