Open 0xdevalias opened 11 months ago
Introducing module graph: Like Webpack and other bundlers, a module graph can help us unminify/rename identifiers and exports from bottom to top.
@pionxzh This sounds like an awesome idea!
Based on 1, the steps gonna be like
[unpacked] -> [???] -> [unminify]
. This new step will build the module graph, do module scanning, rename the file smartly, and provide this information tounminify
.
@pionxzh I've only thought about this a little bit, and it depends on how 'all encompassing' you want the module graph to be, but I think it might even make sense for it (or some other metadata/graph) to capture the mapping from original files -> unmapped as well.
--
For some background context (to help understand some of the things I describe for the graph later on below), the workflow I've been thinking about/following for my own needs would probably be as follows:
raw/
(Ref)raw/
by stripping the hashes from the filenames/etc, run prettier on them, and save in unpacked-stage1
; I also manually figure out if any chunks have changed their identifier, and remove any chunks from the old build that no longer exist in the new build (Ref: 1, 2)wakaru
:
unpacked-stage1/
, and save them into unpacked-stage2/
unpacked-stage2/
, and save them in unminified
While that workflow might be overkill for a lot of people, I like that it allows me to keep the outputs of each of the 'intermediary steps' available, and can cross reference between them if/as needed. I might find that as I start to use this more, that I don't find it useful to keep some of those intermediate steps; but at least for now, that is my workflow.
--
Now with that background context, going back to my thoughts about the graph/etc; I think it would be useful to be able to have a graph/similar that shows:
a1-b1-c1-ha-sh/_buildManifest.js
contains chunk files ["filefoo-abc123.js", "etc.js"]
(Ref)a1-b1-c1-ha-sh/_ssgManifest.js
contains chunk files ["ssgbar-abc123.js", "ssg-etc.js"]
(Ref) webpack-a2b2c2hash.js
contains chunk files ["aaaa-bbbb.js", "etc.js"]
(Ref)filefoo-abc123.js
contains chunk [1337, ...]
1337
[1, 3, 7, 24]
["module1.js", "aUsefulName.js", "a/path/and/a/reallyUsefulName.js", "module24.js"]
And then the actual 'internal module mapping' stuff of what imports/exports what, etc.
I'm not sure exactly how to map the data, but I would probably start with identifying the main 'types' involved, and what makes sense to know/store about each of them. The following might not be complete, but it's what I came up with from a 'first pass':
webpack.js
chunk seems a bit special I think?) (Ref)
This 'metadata file' / graph / etc could then potentially also include the stuff I've talked about before (Ref) for being able to 'guide' the variable/function/etc names used during unminification.
--
I haven't thought deeply through the above yet; it might turn out that some of the things I described there might make sense being split into 2 different things; but I wanted to capture it all while it was in my head.
In the module graph, we can have a map for all exported names and top-level variables/functions, which also allows the user to guide the tool to improve the mapping.
Module graph also brings the possibility of cross-module renaming. For example, un-indirect-call shall detect some pattern and rename the minified export name back to the real name.
@pionxzh šš»š
I like the idea of "AST fingerprinting". This can also be used in module scanning to replace the current regex implementation.
@pionxzh Definitely. Though I (or you, or someone) need to dig into the concepts a bit more and figure out a practical way to implement it; as currently it's sort of a theory in my mind, but not sure how practical it will be in reality.
Created a new issue for that exploration:
I was wanting to visualize the dependencies between my unminified modules, and stumbled across this project:
Create graphs from your CommonJS, AMD or ES6 module dependencies
It mentioned two of it's dependencies, which sound like they could potentially be useful here:
Get the dependency tree of a module
Get the file location associated with a dependency/partial's path
The object form is a mapping of the dependency tree to the filesystem ā where every key is an absolute filepath and the value is another object/subtree.
Off the top of my head, I think the 'high level' module-graph within wakaru
would probably make the most sense to be linked based on the module ID's, rather than the actual import
/export
s / module filenames. That way it would be more robust/not need to change as things are renamed/moved around/etc. So these libraries may not be super useful 'as is' for this.
Some useful commands for visualising module dependencies:
# Get the module dependencies as a static .svg image
madge --image graph.svg path/src/app.js
# Get the module dependencies as a graphviz DOT file
madge --dot path/src/app.js > graph.gv
# Get the module dependencies as json
madge --json path/src/app.js > dependencies.json
The graphviz dot output can then be further explored through an interactive tool such as:
Interactive Graphviz Dot Preview for Visual Studio Code
If there are missing dependencies, these are worth noting for how to see/improve it:
In addition to the above, a couple of other 'dependency graph' viewers I came across when I was looking for tools for this today:
vscode-dependencyGraph A plugin for vscode to view your project's dependency graph
I haven't deeply looked into this, and not for ages, but at one stage I remember having a thought that the chunks specified the other chunks they depended on somewhere (as well as the individual module imports within it) (Ref)
In the code I was most exploring, theres the
_buildManifest.js
(Ref) andwebpack.js
(Ref) chunks that seemed to detail some of the 'high level' of the chunk loading/dependencies/etc; though there was also the chunks loaded directly in the html as well.Looking at a fairly small/basic chunk, it seems like it doesn't have anywhere that specifies dependencies on other chunks (Ref)
But then looking at a far larger chunk file (
pages/_app.js
(Ref), there is this section after all of the normal module definitions that looks like it might handle loading other chunks if they aren't already loaded, and module dependency order or similar:function (U) { var B = function (B) { return U((U.s = B)); }; U.O(0, [774, 179], function () { return B(18992), B(9869), B(76281); }), (_N_E = U.O()); },
Originally posted by @0xdevalias in https://github.com/j4k0xb/webcrack/issues/30#issuecomment-1868383435
Another pattern I just noticed, in _app.js
(Ref), presumably Next specific:
// module-9869.js
(window.__NEXT_P = window.__NEXT_P || []).push([
"/_app",
function () {
return require(68502);
},
]);
Not 100% sure, but Webpack's stats.json
file sounds like it might be relevant here (if not directly, then maybe as a source of inspiration):
Even more tangentially related to this, I've pondered how much we could 're-construct' the files necessary to use tools like bundle analyzer, without having access to the original source (or if there would even be any benefit to trying to do so):
- https://github.com/webpack-contrib/webpack-bundle-analyzer
Webpack plugin and CLI utility that represents bundle content as convenient interactive zoomable treemap
- https://github.com/webpack-contrib/webpack-bundle-analyzer#usage-as-a-cli-utility
You can analyze an existing bundle if you have a webpack stats JSON file.
You can generate it using
BundleAnalyzerPlugin
withgenerateStatsFile
option set totrue
or with this simple command:webpack --profile --json > stats.json
- https://webpack.js.org/api/stats/
Stats Data When compiling source code with webpack, users can generate a JSON file containing statistics about modules. These statistics can be used to analyze an application's dependency graph as well as to optimize compilation speed.
- https://nextjs.org/docs/pages/building-your-application/optimizing/bundle-analyzer
My gut feel is that we probably can figure out most of what we need for it; we probably just can't give accurate sizes for the original pre-minified code, etc; and the module names/etc might not be mappable to their originals unless we have module identification type features (see https://github.com/pionxzh/wakaru/issues/41)
Originally posted by @0xdevalias in https://github.com/0xdevalias/chatgpt-source-watch/issues/9#issuecomment-1974432157
Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/121#issuecomment-1974433150
The Stack Graph / Scope Graph links/references I shared in https://github.com/pionxzh/wakaru/issues/34#issuecomment-2035859278 may be relevant to this issue as well.
There has recently been a new source of discussion around code fingerprinting and module identification over on the
humanify
repo in this issue:Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/74#issuecomment-2372650986
See Also