vercel / nft

Node.js dependency tracing utility
https://npmjs.com/@vercel/nft
MIT License
1.31k stars 136 forks source link

feat: add `analysis.transformAST` hook #396

Open eduardoboucas opened 7 months ago

eduardoboucas commented 7 months ago

TL;DR: Adds a transformAST callback that lets consumers inspect and/or modify the analysis step.

When processing a code bundle, it's often times useful to statically analyse the code to extract additional information, and potentially modify it to conditionally remove dependencies or perform different types of optimisations.

I know that this is not part of the scope of NFT, and there are a plethora of tools out there that let you do this on top of the dependency tracing step, but lexing and parsing are expensive operations that one would avoiding duplicating if possible.

Since NFT is already doing those operations when analysis is enabled, this PR adds an optional transformAST hook that lets consumers read and optionally modify the AST for each file before dependencies are traced. This callback receives the path and the AST root node.

This callback is supplied via the analysis.transformAST property, and it's an optional hook in line with the existing readFile, stat, readlink, and resolve callbacks.

I tried to do this in the least invasive way, with a minimal change in the implementation. If you have any thoughts on how this could be done different, I'll be happy to make those changes. If you'd rather not add this at all, that's also totally understandable, but I think this could be a useful feature so I wanted to propose it.

Thanks!

eduardoboucas commented 7 months ago

Thanks for the swift reply! I missed the notification somehow. 🤦🏻

I'm not sure this is solving the problem.

I think it does, because said tool can call NFT and use the transformAST hook to access the AST that NFT computed and, optionally, make changes to it before NFT starts processing it.

Instead, it would be better to have a new hook called parseAST() (similar to readFile() but instead of reading the file it can replace the acorn parse).

This would allow you to reuse an existing AST parsed by another tool (assuming its acorn compatible).

I have also considered that option, and I think the shape of that parseAST() hook would indeed be more aligned with the existing hooks.

I didn't go for it in this PR because I think it makes things more complex. NFT is tightly coupled to the parser, as it extends it with additional modules, provides its own set of options, and uses it to not only parse but also detect the module type using multiple parsing runs.

If we allow the parsing to happen outside of NFT, the calling module would need to have acorn and all its surrounding modules as dependencies, with the right versions, with the risk of breaking the tracing process entirely if there's a new version of NFT that has different expectations about the parser. The only way for the calling module to test for this type of regression would be to run NFT's entire test suite with its own parser, which is not trivial.

In contrast, if we keep the parsing in NFT but expose it via the transformAST() hook, the calling module can assert, as part of its test suite, whether the data it receives on that hook has changed between versions of NFT. It has the guarantee that NFT will continue working, because it's using the parser exactly as it needs, and puts the onus on the calling module, which is easier to work with because it lives in the user space. Also, there will be no additional dependencies (and the hassle of managing their versions) in the user code.

I also don't love the idea of exposing the AST because that means we're stuck on acorn forever and can't easily swap for a faster alternative.

With the solution I'm proposing, you could easily swap acorn with anything else, and it would be up to the consumer to adjust their code to use the new AST format you choose.

Having said all that, I can rework the PR to use whatever option you prefer.

Thanks!