There are some js to c/wasm transpilers but:

          There are some js to c/wasm transpilers but:

they only supports a small subset of js
https://stackoverflow.com/a/5193132/4011614

Very, very tricky --- Javascript is a heavily dynamic language where pretty much everything can be changed at run time: object properties, functions, types, etc. As such it maps very badly onto C. Any Javascript translator would have to be able to cope with such things, which means it would have to translate the Javascript into machine code at run-time - which makes it a JIT, which we're already using.

So a transpiler that supported every js feature would still be slower overall.

https://www.assemblyscript.org is very similar to js/ts but would also require rewriting or substantially modifying all libs, in which case using another language is most likely a better choice

Originally posted by @j4k0xb in https://github.com/j4k0xb/webcrack/issues/23#issuecomment-1806459574

found this: https://www.wasm.builders/gunjan_0307/compiling-javascript-to-wasm-34lk

. (you can keep commenting on the previous issue btw 😅)

javy uses quickjs under the hood which is a javascript interpreter that can be run in wasm. But the javascript itself doesn't get compiled to wasm, only embedded in it as text.

I used it before to evaluate some parts of the obfuscated code before but the reason was for sandboxing and not speed. As you can see v8 is roughly 30-100x as fast: https://bellard.org/quickjs/bench.html

what about v8-compile-cache??

(you can keep commenting on the previous issue btw 😅)

For ease of others following along, these are the previous issues that this discussion has been fragmented across:

I haven't evaluated if it would fully cover your needs, but if you haven't already, you might like to look at tree-sitter / web-tree-sitter / etc:

My suggestion here would be a rather major change, so it's not worth looking into too deeply unless it turns out there is no good/efficient way to manage this with the current AST parsers/etc; but one thing I stumbled across/was thinking about the other day was that the swc / tree-sitter / etc Rust/etc parsers can apparently be used from JS apps; and how that might allow us to run the unminify process much faster and/or in a potentially more memory efficient way:

These links/resources probably aren't exhaustive; but figured I would share them as a starting point in case this was a path that was worth looking into at some stage:

tree-sitter / web-tree-sitter

https://github.com/tree-sitter/node-tree-sitter

Node.js bindings for tree-sitter

https://www.npmjs.com/package/web-tree-sitter

https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web

WebAssembly bindings to the Tree-sitter parsing library

https://crates.io/crates/tree-sitter-javascript

https://github.com/tree-sitter/tree-sitter-javascript

JavaScript and JSX grammar for tree-sitter. For TypeScript, see tree-sitter-typescript.

swc / @swc/wasm-web / swc_ecma_parser

https://swc.rs/

https://swc.rs/docs/usage/core#parse

https://swc.rs/docs/usage/wasm

Some examples of using @swc/wasm-web to parse code to an AST in AST Explorer(s)

https://github.com/fkling/astexplorer/blob/master/website/src/parsers/js/swc.js#L3

https://github.com/sxzz/ast-explorer/blob/main/composables/language/javascript.ts#L74-L109

https://github.com/swc-project/swc/discussions/3713

Running a plugin with @swc/wasm-web

https://rustdoc.swc.rs/swc/index.html

https://rustdoc.swc.rs/swc_ecma_parser/

https://rustdoc.swc.rs/swc_ecma_transforms_base/fn.resolver.html

https://rustdoc.swc.rs/swc_ecma_minifier/eval/struct.Evaluator.html

https://www.christopherbiscardi.com/how-to-print-a-javascript-ast-using-swc-and-rust

How to print a JavaScript AST using SWC and Rust

https://blog.logrocket.com/writing-webpack-plugins-rust-using-swc/

Writing webpack plugins in Rust using SWC for faster builds

https://github.com/swc-project/swc/discussions/3254

Q: How to manipulate nodes+parent_nodes in AST & generate the tree back to JS code?

https://play.swc.rs/

The output can be set to AST

ast-grep

https://ast-grep.github.io/reference/api.html

ast-grep currently has an experimental API for Node.js

Etc

https://github.com/rustwasm/wasm-pack

This tool seeks to be a one-stop shop for building and working with rust- generated WebAssembly that you would like to interop with JavaScript, in the browser or with Node.js. wasm-pack helps you build rust-generated WebAssembly packages that you could publish to the npm registry, or otherwise use alongside any javascript packages in workflows that you already use

https://rustwasm.github.io/wasm-pack/book/

Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/35#issuecomment-1815952802

that the swc / tree-sitter / etc Rust/etc parsers can apparently be used from JS apps

I experimented a while ago with using rust parsers and converting back to the babel AST format: https://github.com/coderaiser/swc-to-babel The parsing is faster but unfortunately transferring all the objects to the JS context has so much overhead that its slower than the babel parser. So it should most likely be entirely JS or a native language, not mixed.

Another WIP JS-only attempt with https://github.com/meriyah/meriyah + https://github.com/j4k0xb/estree-to-babel/tree/perf could speed up parsing from 1.5s to 900ms for a large file Not sure if its worth it since parsing takes up like 2% and transforms 98% of the time

I experimented a while ago with using rust parsers and converting back to the babel AST format The parsing is faster but unfortunately transferring all the objects to the JS context has so much overhead that its slower than the babel parser. So it should most likely be entirely JS or a native language, not mixed.

@j4k0xb True true, that would definitely make sense. I haven't spent much time looking deeply into the combination of JS + webasm things, but I do remember there being something about the 'data transfer' between them being somewhat 'expensive'.

How did you go about transfering the objects back to JS out of curiosity? I wonder if there are any methods that could be used to optimize that in a way that would speed things up; or alternatively, maybe a way to keep the actual AST objects within the rust side of things, but be able to manipulate them from JS functions/similar?

Another WIP JS-only attempt with could speed up parsing from 1.5s to 900ms for a large file Not sure if its worth it since parsing takes up like 2% and transforms 98% of the time

@j4k0xb Yeah, personally I would think optimising for the transform side of things makes the most sense as well.

Not sure how often it would come up in reality, but one thing I was thinking about was for particularly large bundles, sometimes it might take an unreasonable amount of time and/or memory to try and unbundle them; and so was thinking that in a case like that, perhaps it could make sense to use a rust/etc based tool to first extract them into individual module files (as an example), and then potentially process them in a later 'unminify' step (that could potentially be JS based).

That might not resolve anything anyway though, as the unminify step probably has to load the full JS context into memory to be able to unminify anyway, so it would probably just 'out of memory' at that point if it was going to.

How did you go about transfering the objects back to JS out of curiosity?

Its serializing it with serde_json::to_string and deserializing with JSON.parse. Idk how to benchmark how long the native parsing vs serializing takes, here's the combined time for a 1.2MB bundle (380k nodes):

SWC parse: 959 ms (bindings.parseSync 491 ms + JSON.parse 466 ms)
babel parse: 442 ms
meriyah parse: 171 ms

or alternatively, maybe a way to keep the actual AST objects within the rust side of things, but be able to manipulate them from JS functions/similar?

Definitely a possibility, but again the whole project would have to be rewritten in rust

sometimes it might take an unreasonable amount of time and/or memory to try and unbundle them

Do you have examples where this happens? The current time/memory usage is not that unreasonable imo:

1.2MB webpack bundle: 1.1s unpack, 6s total, 450mb memory
7.2MB obfuscated browserify bundle: 0.3s unpack, 14s total, 1.1GB memory
8MB webpack bundle: 4s unpack, 17s total, 1.2GB memory

to use a rust/etc based tool to first extract them into individual module files (as an example), and then potentially process them in a later 'unminify' step

I also thought about something similar where transforms could be performed in parallel for each module. How would that work with other kinds of scripts though? For example for deobfuscation its necessary to unminify everything before, a script could contain no bundle or a bundle in a deeply nested part of the code Maybe expose unminify/unpack/deobfuscate as own functions or modules so they can be much more optimized (can merge visitors, unminify doesn't need to crawl the scope)

(I thought I replied to this the other week, but apparently got distracted somewhere along the way.. 😅)

here's the combined time for a 1.2MB bundle (380k nodes)

@j4k0xb True true, that doesn't look too bad all in all then.

Definitely a possibility, but again the whole project would have to be rewritten in rust

@j4k0xb Yeah.. that would be less than ideal.

I don't know how possible it is, or if there are existing libraries/etc that would make it easier, but my original thought in this area was if it was possible to basically 'describe the transformation to be made' on the JS side of things, and then pass that through to the rust side for it to do the 'heavy lifting'.

If that were possible, then it should remove the need to convert the whole project to rust, and also would remove the performance penalty of needing to pass the whole AST back and forth across the rust/JS boundary.

The 2 ways I was initially pondering that this might be possible were:

a rust library that exposes an 'AST transformation DSL', such that the JS side could use that DSL to describe the transformations, then pass those to the rust side of things to actually process/run
a rust library that can run/interpret some basic JS code within rust itself, and use that to apply the transformations (that are still written in JS)

There was also one more idea I was pondering, though not sure if it would be impractical due to needing to cross back and forth across the rust/JS boundary (this idea is less thought through than the others above; and realistically could probably work with/alongside them, particularly the DSL idea):

somehow do the main 'AST walking'/modification on the rust side of things, but be able to specify when it should pass a smaller subset of the AST back to the JS side for more specific processing

Some related ideas/exploration based on the above potentials was:

looking into using tree-sitter's API for matching/walking, and then implementing some kind of transform + codegen on top of that; e.g.:
- Some comments exploring this on the wakaru repo:
- https://github.com/pionxzh/wakaru/issues/35#issuecomment-1832649414
  - Exploring the tree-sitter query API
- https://tree-sitter.github.io/tree-sitter/using-parsers#walking-trees-with-tree-cursors
- https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries
- https://tree-sitter.github.io/tree-sitter/using-parsers#editing
- https://github.com/tree-sitter/tree-sitter/issues/642#issuecomment-656422633
- https://github.com/tree-sitter/tree-sitter/discussions/2553
- https://github.com/tree-sitter/tree-sitter/discussions/2667
- This thread also links to the following YouTube video (which I haven't watched yet, otherwise I would link to a more specifically relevant location within it):
- you can also look an interesting talk where also an another project author mentions how they use Tree-sitter for edits https://www.youtube.com/watch?v=NcUJnmBqHTY
looking at ast-grep, and how they implement their matchers/replacers, either to use directly, or as a concept for building a more generic DSL for it:
- https://betterprogramming.pub/deep-dive-into-ast-greps-pattern-7efc3eefc7c3
- A Deep Dive Into ast-grep’s Pattern
- Some comments exploring this on the wakaru repo:
- https://github.com/pionxzh/wakaru/issues/35#issuecomment-1815952802
  - https://ast-grep.github.io/reference/api.html
    
    ast-grep currently has an experimental API for Node.js
- https://github.com/pionxzh/wakaru/issues/35#issuecomment-1832553348
  - The benchmark shows that ast-grep can be 3-4 times faster and require a smaller memory footprint. However, at present, we only have napi binding that can only run on the CLI. A WASM build is required for the browser environment (playground). I'm not too sure about the future direction of this.
  - It's important to note that while ast-grep fulfills many of our requirements, we do not plan to rewrite complex rules using it. Complex rules often involve scope awareness, advanced structure matching, and AST node building. Therefore, it's challenging to determine whether the overall performance gains will justify the effort required for integration.
- https://github.com/pionxzh/wakaru/issues/35#issuecomment-1832588965
- Some notes on ast-grep's support for wasm builds
- https://github.com/pionxzh/wakaru/issues/35#issuecomment-1835402053
- Some notes on benchmarks with ast-grep / tree-sitter
- https://github.com/pionxzh/wakaru/issues/35#issuecomment-1835668278
- Some notes diving deeper into ast-grep's source code to understand how their match/replace API works on top of tree-sitter; as it theoretically already provides a DSL that removes the need to cross back and forth between JS/rust boundaries
considering whether boa could be used as an embedded JS interpreter to be able to run JS based transformation rules 'within' the rust side of things, and thus avoiding the cross-over back and forth
- https://github.com/boa-dev/boa
- Boa is an embeddable and experimental Javascript engine written in Rust. Currently, it has support for some of the language.
- This is an experimental Javascript lexer, parser and interpreter written in Rust. Currently, it has support for some of the language.
- https://boajs.dev/
  - Boa is an experimental Javascript lexer, parser and compiler written in Rust. Currently, it has support for some of the language. It can be embedded in Rust projects fairly easily and also used from the command line. Boa also exists to serve as a Rust implementation of the EcmaScript specification, there will be areas where we can utilise Rust and its fantastic ecosystem to make a fast, concurrent and safe engine.
  - https://boajs.dev/boa/playground/
  - https://boajs.dev/posts/2022-10-24-boa-usage/
  - Adding a JavaScript interpreter to your Rust project
  - https://boajs.dev/boa/dev/bench/
  - Boa Benchmarks
  - https://boajs.dev/boa/test262/
  - EcmaScript conformance test results for Boa

Do you have examples where this happens?

@j4k0xb I don't personally have specific examples, and the one I was thinking of was using wakaru rather than webcrack, so it might not even apply here; but here are the details from that issue:

I have a 5Mb sized file that needs to be processed, I've tried running it directly in node and it takes up to 21G of memory space in some transform processing.

Originally posted by @StringKe in https://github.com/pionxzh/wakaru/issues/35#issue-1987139845

I tried the sample with the new CLI.

It took ~2GB and 60 seconds to pass for unpacker (and failed to unpack it into multiple files😞; I will check further!).
Generated 1 modules from main.ba3b216f.js to out (61,265.4ms)
For unminify, it took 3.4 hours... to finish the whole process. Memory usage is from 100MB ~ 1GB ~ 2GB depending on the rule.

I will improve the CLI to write files more frequently during the unminify process, and add a --perf flag for recording the time and memory usage.

Originally posted by @pionxzh in https://github.com/pionxzh/wakaru/issues/35#issuecomment-1818082000

The following may also be of interest:

I wonder if there are any methods that could be used to optimize that in a way that would speed things up; or alternatively, maybe a way to keep the actual AST objects within the rust side of things, but be able to manipulate them from JS functions/similar?

Actually that's how ast-grep works. Here is a quote from Benchmark TypeScript Parsers: Demystify Rust Tooling Performance:

Tree-sitter and ast-grep avoid serde overhead by returning a tree object rather than a full AST structure. Accessing tree nodes requires invoking Rust methods from JavaScript, which distributes the cost over the reading process.

Originally posted by @pionxzh in https://github.com/pionxzh/wakaru/issues/35#issuecomment-1838763473

Actually that's how ast-grep works.

Oh true. I was thinking it probably did, particularly after skimming the source the other day (Ref), but I wasn't 100% sure still.

Here is a quote from Benchmark TypeScript Parsers: Demystify Rust Tooling Performance:

tree-sitter and ast-grep avoid serde overhead by returning a tree object rather than a full AST structure. Accessing tree nodes requires invoking Rust methods from JavaScript, which distributes the cost over the reading process.

That's super interesting and neat to know. For future reference, here's a link to the non-medium-paywalled version of that article (Ref) It was a really interesting read in general!

The results of the benchmarks for synchronous performance were pretty interesting, I definitely didn't expect swc to perform so poorly in general, nor for babel to perform unexpectedly so much better on the medium-sized file compared to most other things (I wonder what made it perform so well there?). It was also interesting to see that ast-grep consistently beat tree-sitter on it's own, despite it using tree-sitter as it's parser.

It was also really interesting in the async parsing to see just how much ast-grep seems to dominate everything; again seeming to perform way better than the tree-sitter parser it's built on top of:

These notes towards the end were also interesting/worth paying attention to:

Native Parser Performance Tricks

tree-sitter & ast-grep' Edge

These parsers manage to bypass serde costs post-parsing by returning a Rust object wrapper to Node.js. This strategy, while efficient, can lead to slower AST access in JavaScript as the cost is amortized over the reading phase.

ast-grep's async advantage:

ast-grep's performance in concurrent parsing scenarios is largely due to its utilization of multiple libuv threads. By default, the libuv thread pool size is set to four, but there's potential to enhance performance further by expanding the thread pool size, thus fully leveraging the available CPU cores.

Cool to see this note at the very end aligns with one of the thoughts I had too!

Shifting Workloads to Rust: The creation of a domain-specific language (DSL) tailored for AST node querying could shift a greater portion of computational work to the Rust side, enhancing overall efficiency.

Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/35#issuecomment-1839675606

If you get deeper into benchmarking/optimisation of the Rust side of things, this was a good read:

https://medium.com/source-and-buggy/data-driven-performance-optimization-with-rust-and-miri-70cb6dde0d35

Which I summarised the main points as:

The tl;dr of that blog seemed to be that they first used flamegraph, which was mildly useful, but not specifically detailed enough:

https://github.com/flamegraph-rs/flamegraph

A Rust-powered flamegraph generator with additional support for Cargo projects! It can be used to profile anything, not just Rust projects!

And then moved on to using miri and a less-obvious feature of it to generate a far more detailed trace, that could then be analysed within Google Chrome's DevTools performance tab:

https://github.com/rust-lang/miri

An experimental interpreter for Rust's mid-level intermediate representation (MIR)

https://github.com/rust-lang/miri#miri--z-flags-and-environment-variables

-Zmiri-measureme=<name> enables measureme profiling for the interpreted program. This can be used to find which parts of your program are executing slowly under Miri. The profile is written out to a file inside a directory called <name>, and can be processed using the tools in the repository https://github.com/rust-lang/measureme

https://github.com/rust-lang/measureme#crox

crox turns measureme profiling data into files that can be visualized by the Chromium performance tools.

https://github.com/rust-lang/measureme/tree/master/crox#readme

Originally posted by @0xdevalias in https://github.com/ast-grep/ast-grep/issues/144#issuecomment-1839767054

j4k0xb / webcrack

There are some js to c/wasm transpilers but: #24

`tree-sitter` / `web-tree-sitter`

`swc` / `@swc/wasm-web` / `swc_ecma_parser`

`ast-grep`

Etc

Native Parser Performance Tricks

j4k0xb / webcrack

There are some js to c/wasm transpilers but: #24

tree-sitter / web-tree-sitter

swc / @swc/wasm-web / swc_ecma_parser

ast-grep

Etc

Native Parser Performance Tricks

`tree-sitter` / `web-tree-sitter`

`swc` / `@swc/wasm-web` / `swc_ecma_parser`

`ast-grep`