mike-lischke / ANTLRng

A Typescript port of the ANTLR4 Java tool
MIT License
24 stars 2 forks source link

Development planning, Milestone 1 #5

Closed mike-lischke closed 11 months ago

mike-lischke commented 1 year ago

This is the initial major step of the entire project. It lays ground for all following steps. It consists of these points:

matthew-dean commented 1 year ago

Oh! There would be a different distributed antlr4 runtime? That's pretty awesome. Are you testing with https://github.com/antlr/antlr4/releases/tag/4.12.0 since that's primarily a TypeScript release? Having e2e TypeScript would be rad af.

matthew-dean commented 1 year ago

This is probably a bonkers idea, but another thing to consider is AssemblyScript, would allow both a TypeScript-compatible build AND a WASM runtime.

mike-lischke commented 1 year ago

Yes, the first step is a complete TS runtime, not just type definitions over a JS runtime (like what is currently in ANTLR4). Pretty much like antlr4ts, but for ANTLR 4.12.

AssemblyScript is something I have in my head for quite some time already, but it has some limitations yet, like no union types, only simple interface support, no promises...

matthew-dean commented 1 year ago

@mike-lischke What about just updating the JavaScript runtime to a TypeScript one in the Antlr project itself? Or a type-checked JSDoc? It already gets transpiled / packed with Webpack. You don't think they'd be open to it?

mike-lischke commented 1 year ago

Doing that would mean to live with its limitations. That runtime is not complete and since it is JS, there's no type checking within the runtime. I prefer a full TS runtime, not just typings.

matthew-dean commented 1 year ago

You can type-check within a JS source with JSDoc. There are some popular packages that do so, but I guess yeah, if there are other limitations and places where its incomplete, that makes sense.

In what ways is it incomplete? Is it documented? I do know that the majority of the .d.ts files are incorrect, and so TypeScript will complain if you try to import one of the .js files directly.

Another thing just browsing the JavaScript runtime source, it doesn't look to be very tree-shakeable. A lot of the files are grouped into objects and re-exported to their parent, instead of being distinct imports / exports. It would be nice that if there are parts of the Antlr runtime you don't use, it wouldn't end up in a bundle.

mike-lischke commented 1 year ago

For the differences just compare the two runtime folders of Java and JavaScript. I haven't yet pushed the new TS runtime, as I'm in the middle of porting antlr4-c3, but it will look exactly like the Java runtime, just with Typescript files.

The JS runtime contains only the minimal required number of files, to allow parsing input. And the type definitions are an even smaller layer on top of that. Parsing something is possible, but that's it. Tools which need more of the support classes (like the vocabulary) cannot use it.

About tree-shaking: I'm not sure how much the structure influences bundling here, but the runtime is not very big. Though for me the bundle sizes are not of very high interest. Todays machines and connections are fast enough to transport megabytes easily. Functionality weighs more IMO. In another project I work on I have bundles with such sizes (using rollup.js):

build/assets/rdbms-info-4348324e.js 13.24 kB │ gzip: 2.86 kB build/assets/keywords-1d566755.js 23.47 kB │ gzip: 3.85 kB build/assets/builtin-functions-6a8e8352.js 33.98 kB │ gzip: 8.80 kB build/assets/system-functions-f32d6e43.js 75.58 kB │ gzip: 16.37 kB build/assets/tabulator-tables-906d6f2f.js 387.27 kB │ gzip: 89.51 kB build/assets/system-variables-a8dea4ec.js 822.19 kB │ gzip: 94.68 kB build/assets/index-6d8dd2b8.js 893.82 kB │ gzip: 223.70 kB build/assets/monaco-editor-84b8038e.js 3,357.21 kB │ gzip: 836.85 kB build/assets/dependencies-45be8b40.js 4,199.15 kB │ gzip: 1,108.13 kB

but I have yet to see how much the new TS runtime and the JREE end up when bundling. They will go into that mentioned app to replace antlr4ts there (which is located in that dependencies bundle).

matthew-dean commented 1 year ago

@mike-lischke

it will look exactly like the Java runtime, just with Typescript files

That is really great news! Let me know if there are ways others can contribute. I'm currently working on a more robust and complete CSS parser using Antlr, and pre-processor grammars that properly extend it, and after I built the grammar and was trying to build tests, I immediately was running into problems with incomplete / missing types in the JavaScript runtime. I proposed this solution to fix the mess of types, but I think yours is a much better one, so I closed it.

mike-lischke commented 1 year ago

The current focus is not so much on the TS ANTLR4 tool, but on the new TS runtime. I'm currently converting some of the JDK tests to verify that the JRE emulation works well. While debugging the antlr4-c3 package I found a few problems, which I have to fix first. After that I can continue to make parsing work (the lexer already works).

There are so many areas in this group of projects which need improvement. Just pick something you like and file PRs. Filing bugs is probably not a good idea, as this is too early in the process. These are the current main topics:

Note: I have not yet uploaded the new runtime, as there's no node package yet, but I can create the repo in Github if you want to spend some time porting your CSS parser to it and debug the issues. The new TS runtime builds fine already and is complete, but needs some fine tuning. Or maybe rather the JREE that provides the runtime for the runtime ... :-)

matthew-dean commented 1 year ago

I can create the repo in Github if you want to spend some time porting your CSS parser to it and debug the issues

If the TS runtime is feature-complete (albeit with possible bugs), I'm happy to try it out and do that. I'd only just started with the JS/TS-API side of my parser(s) i.e. I hadn't gotten very far with the JS/TS runtime in the Antlr repo (mostly because I immediately ran into roadblocks like the DefaultErrorStrategy neither being exported at the root nor having correct TS types when I tried to import it from source).

mike-lischke commented 1 year ago

As mentioned before the new ANTLR4 TS runtime is probably in a pretty good shape already, but the underlying JRE emulation layer still requires some significant work. This repository contains the fully cleaned up TS runtime already, but needs an update yet, which I will do next weekend if all goes well. You could then start debugging it, but be warned: this will probably mean to step through the ANTLR4 code just to find that something in the underlying JREE lib is wrong.

I will hopefully soon have the new list iterator tests running for the JREE, to eliminate a block of problems I found while testing the TS runtime.

mike-lischke commented 1 year ago

OK, I just pushed the latest code, but don't expect this to fully build. I really need to update the jree Node package for that, which in turn requires that I finish the JDK tests I have converted.

matthew-dean commented 1 year ago

Hmm, what do you mean, JRE emulation? The TS runtime runs a Java emulator? 🤔

mike-lischke commented 1 year ago

For this project I decided to follow a different idea. Instead of porting the code and implementing the necessary Java infrastructure (lists, file system etc.) in a way that is only usable in the TS runtime, I provide a JRE emulation layer, which implements the Java APIs as TS classes. This provides a generally usable TypeScript "J"RE, not just one that can be used in the ANTLR4 runtime.

In parallel I'm working on a Java to TypeScript converter, which I used to machine-translate the ANTLR4 Java runtime to TS. When translating between languages, the biggest problem is the runtime used by the source language, which is not available for the target language. For converted Java code I created the JREE. And because this is a clean-room implementation (read there what that means), I heavily rely on JDK tests (I also converted using java2ts).

Of course it is out of scope to convert the entire JRE. Instead I only convert what I need for the ANTRL4 TS runtime (+ plus a few things needed by the tests). This approach has matured for a while and I'm now in the process to integrate the JREE and the new TS runtime in a real project (antlr4-c3). While doing that I found a number of problems, which I handled by adding the associated JDK tests to the JREE tests and fixing the issues found by them.

Yes, this means to juggle with 4 projects at the same time, but the java2ts project is meanwhile in a good shape, so that I don't need much extra work for it. And the other ones clearly depend hierarchically on each other: finish the ported JDK tests (and fix issues found by that) -> use the new TS runtime and fix issues found in it -> fix the library in which the runtime is used (here antlr4-c3). You could instead use your CSS parser.

matthew-dean commented 1 year ago

@mike-lischke I did try the TS runtime and yeah, it doesn't build yet.

Meanwhile, though, the "official" TS runtime doesn't work either. 😞

mike-lischke commented 1 year ago

To be honest, I haven't pushed the latest code yet, as I'm using it in an isolated fashion for another project. I'm in the middle of porting my antlr4-c3 to the new ANTLR runtime and have made quite some progress. Only the very large C++ grammar fails currently to parse (has a stack overflow). So I think I'm pretty close to having a full ANTLR4 TypeScript runtime.

Tbh. I'm not sure if I should publish the new TS runtime as part of the ANTLR4 tool or separately. Currently I'm leaning towards an own node package, as that is how the TS runtime will be consumed anyway.

matthew-dean commented 1 year ago

@mike-lischke Fair enough! I'll be eager to try it when it's published!

trmjoa commented 1 year ago

Hi, @mike-lischke ; Is there any rough estimates when you will publish a package (beta for that matter)? I am in a situation where I would like to use the antrl4-cs package, but I can't use the current antlr4ts package because it depends on a outdated antlr4 version.

Your project seems like a perfect fit.

mike-lischke commented 1 year ago

hi @trmjoa, estimates are always difficult to give. It would be easier if there were more people working in the project. However, I'm pretty sure there will be nothing usable in June (and probably July) yet.

trmjoa commented 1 year ago

I will reach out to you by email.

mike-lischke commented 11 months ago

@trmjoa In case you didn't see it: the new ANTLR4 TS runtime is now available as antlr4ng on NPM.

With that in place this milestone is finished.