Open ivan-collab-git opened 1 month ago
Hey, take a look at these files for reference:
options.bundle
and stops:
https://github.com/j4k0xb/webcrack/blob/404a331efd1fb48b2b3080dd1d0b5b6f0f874854/packages/webcrack/src/unpack/index.ts#L28-L33and would you consider this as something feasible to implement?
Yeah the bundle format looks pretty straight-forward.
I suggest you look through https://github.com/j4k0xb/webcrack/blob/master/CONTRIBUTING.md first and try debugging to get a better idea on how the existing unpacker works (only reading the code may be a bit confusing).
Then start by creating the boilerplate files and use a { VariableDeclaration(path) {
visitor in src/unpack/metro/index.ts
to check if var __BUNDLE_START_TIME__
is present.
Highly recommend using @codemod/matchers to help with finding AST nodes.
The next easiest thing would be extracting the entry module id from __r(0);
(could use path.getAllNextSiblings()
).
Example bundle:
var __BUNDLE_START_TIME__ = this.nativePerformanceNow ? nativePerformanceNow() : Date.now();
var __DEV__ = false;
var process = this.process || {};
var __METRO_GLOBAL_PREFIX__ = "";
process.env = process.env || {};
process.env.NODE_ENV = process.env.NODE_ENV || "production";
(function (global) {
// ...
})(typeof globalThis !== "undefined" ? globalThis : typeof global !== "undefined" ? global : typeof window !== "undefined" ? window : this);
__d(function (global, _$$_REQUIRE, _$$_IMPORT_DEFAULT, _$$_IMPORT_ALL, module, exports, _dependencyMap) {
"use strict";
const lib = _$$_REQUIRE(_dependencyMap[0]);
console.log(lib.foo);
}, 0, [1]);
__d(function (global, _$$_REQUIRE, _$$_IMPORT_DEFAULT, _$$_IMPORT_ALL, module, exports, _dependencyMap) {
"use strict";
exports.foo = "bar";
}, 1, []);
__r(0);
great. ill give it a check!
Hey what's up. I did this notes the other day, and i was expecting to work on the implementation, but i have had little to no time, so in the mean time ill share this with you, so you can give me a sanity check. I tried to map the concepts from a post i red about the webpack require function, and i think it is pretty similar. Basically the same.
MetroRequire.<hash>.js
The runtime is usually called MetroRequire.<hash>.js
and it defines these variables:
var __BUNDLE_START_TIME__ = this.nativePerformanceNow
? nativePerformanceNow()
: Date.now(),
__DEV__ = false,
process = this.process || {};
process.env = process.env || {};
process.env.NODE_ENV = process.env.NODE_ENV || "production";
It then calls an IIFE, which accepts only one parameter, this parameter is meant to be the global object
(function(e){ ... })(
"undefined" != typeof globalThis
? globalThis
: "undefined" != typeof global
? global
: "undefined" != typeof window
? window
: this,
);
We will now define five fundamentals objects (which ill explain their functionality further on):
The global object. e
, is the only parameter received by the IIFE
Modules. A loaded module is defined as an object which is stored in a cache object (the next fundamental object), this cache stores the module as:
{
dependencyMap: o,
factory: e,
hasError: !1,
importedAll: r,
importedDefault: r,
isInitialized: !1,
publicModule: { exports: {} },
}
In which:
factory
is the module accessible as a function that receives a fixed number of parametersdependecyMap
is an array of modulesId that will be provided to module.factory
as a parameter to be able to load files (more on this later)publicModule
has an exports
property, which after the module's factory
function has been called, it'll contain the exported objects of the moduleThe t
object: It is basically a cache for modules that already have been loaded, it' ll store every module as an object mapped by its id as a key that can be accessed trough t[moduleId]
. This object is accesible trough window
, as it is defined as a global variable.
o
(this method is defined as (e.__c = o)
) is sort of a constructor function for the cache variable t
, because it is defined as:function o() {
return (t = Object.create(null));
}
null
as a prototype, assigns it to a global variable t
, and returns t
. o
to e.__c
, the cache variable is defined calling this function var t = o();
The require function. The __r
method is an object, also attached to the global object, which is assigned an i
function (e.__r = i),
, this function serves as a require function, because it'll load any module, given an id. It works as follows:
i
is defined as:
function i(e) {
const r = e,
n = t[r]; // t is the cache object
return n && n.isInitialized ? n.publicModule.exports : d(r, n);
}
For this section i will use r
, as the module id and n
as the module object, both initialy passed to the d(r,n)
function in i()
(the function above, the definition of the require function). If the module has already been initialized, then it returns its exports, if not, it then calls d(r,n)
, which does nothing but returning the returned value of a funtion call m(r,n)
(and handle errors if any) that receives the exact same parameters as d()
.
This m
funciton is where the interesting stuff happens. It makes some checks on the n
module, then it sets its propery n.isInitialized
to true, and calls the module, eventually returning its exports. As shown below:
n.isInitialized = !0;
const { factory: c, dependencyMap: d } = n;
try {
const t = n.publicModule;
return (
(t.id = r),
c(e, i, l, u, t, t.exports, d),
(n.factory = void 0),
(n.dependencyMap = void 0),
t.exports
);
} catch (e) { ... }
This is interesting because it calls the factory function (the c
function on top), which is the module itself, this function receives 7 parameters making accesible certain objects to the module:
1) The first parameter e
is the global object,
2) The second parameter, i
is the __r
function, which is the require function (checks if a module exists in the cache and returns its exports, if is not cached, it calls the module, caches it, and then returns its exports)
3) the third parameter is a function that receives a module id, and returns its property importedDefault
, which for what ive seen is a global variable, initialized an empty object {}
, and for what iv've seen, it is not modified in this file.
4) The fourth parameter, is similar to the third, a function that receives a module id and returns a property importedAll
, which for every module, it is initially the same object as importedDefault
.
5) The fifth is the publicModule
property of the module ( which is the exports )
6) The sixth is the publicModule.exports
property that is the exports, (is the exports
property of the fifth parameter)
7) The seventh is the dependencyMap
property of the module.
The __d
method. The __d
method loads modules into the cache. Also attached to the global object the method __d
is defined as:
(e.__d = function (e, n, o) {
if (null != t[n]) return;
const i = {
dependencyMap: o,
factory: e,
hasError: !1,
importedAll: r,
importedDefault: r,
isInitialized: !1,
publicModule: { exports: {} },
};
t[n] = i;
}),
__d(
function (g, r, i, a, m, e, d) {
"use strict";
const t = r(d[0]).default || r(d[0]);
let c;
m.exports = () => c || ((c = t("locale")), c || "en");
},
"44cd5c",
["b2dff4"],
);
__d
method definition, we observe that it checks in t
(cache) if the module id (second paramter) is an existing key, if it is, it returns (because is loaded already), if not, it assigns the object i
into t[n]
(where n
is the module id), where i
represents the loaded module in t
.asyncRequire.\<hash\>.js
The asyncRequire file, is apparently what loads lazy loaded files. Modules that call lazy loaded files, call this module as r(d[4])(d[3])
where 4 is the mapping inside the calling module of the id of the module of asyncRequire. So the function r(d[4])
is loading asyncRequire, and then it receives the parameter d[3]
which points to the id of the lazy loaded file. In this file there is a function passed as a parameter to __d
, in where a function T
is defined, this function is then assigned to a method of the exports
object ( the one returned by __r
),called setData
, this method it is called at the end of the file __r("057569").setData( .. )
and what it seems to do is:
h[o]
a mapped array of arrays, which contains the links of to which every function T(t, n, o) {
Object.entries(o).forEach(([o, s]) => {
const c = s.map((s) => {
if (void 0 === n[s])
throw new ReferenceError(
`Bad async module data, cannot locate index ${s} in the bundleRequestPaths array for segmentId=${o}`,
);
return `${t}${n[s]}`;
});
h[o] = c;
});
}
t
, the require function __r
, and the __d
method and attaches all of these into the global object. Then there are some bundles that call the __r
function directly, this i think are the entrypoints these bundles are:
__r("<asyncRequireModuleId>")("<lazyLoadedModuleId>")
) defined in async require.__r
directly, other bundles call the same moduleId and then run the extends()
method over the exports object that __r
returns. This are bundles that need translations to another language:
extends()
is called after the __r
return value, this is used to make translations, in this bundles, it uses the module id __r("a9f4b1")
to load this translations, and all of the modules that need translation in that bundle, loads it.this is the metroRequire file. The one that contains the runtime. I tallked about other files at the end of the notes, but this is the main one.
var __BUNDLE_START_TIME__ = this.nativePerformanceNow
? nativePerformanceNow()
: Date.now(),
__DEV__ = false,
process = this.process || {};
process.env = process.env || {};
process.env.NODE_ENV = process.env.NODE_ENV || "production";
!(function (e) {
"use strict";
(e.__r = i),
(e.__d = function (e, n, o) {
if (null != t[n]) return;
const i = {
dependencyMap: o,
factory: e,
hasError: !1,
importedAll: r,
importedDefault: r,
isInitialized: !1,
publicModule: { exports: {} },
};
t[n] = i;
}),
(e.__c = o),
(e.__registerSegment = function (e, r, n) {
(p[e] = r),
n &&
n.forEach((r) => {
t[r] || h.has(r) || h.set(r, e);
});
});
var t = o();
const r = {},
{ hasOwnProperty: n } = {};
function o() {
return (t = Object.create(null));
}
function i(e) {
const r = e,
n = t[r];
return n && n.isInitialized ? n.publicModule.exports : d(r, n);
}
function l(e) {
const n = e;
if (t[n] && t[n].importedDefault !== r) return t[n].importedDefault;
const o = i(n),
l = o && o.__esModule ? o.default : o;
return (t[n].importedDefault = l);
}
function u(e) {
const o = e;
if (t[o] && t[o].importedAll !== r) return t[o].importedAll;
const l = i(o);
let u;
if (l && l.__esModule) u = l;
else {
if (((u = {}), l)) for (const e in l) n.call(l, e) && (u[e] = l[e]);
u.default = l;
}
return (t[o].importedAll = u);
}
(i.importDefault = l), (i.importAll = u);
let c = !1;
function d(t, r) {
if (!c && e.ErrorUtils) {
let n;
c = !0;
try {
n = m(t, r);
} catch (t) {
e.ErrorUtils.reportFatalError(t);
}
return (c = !1), n;
}
return m(t, r);
}
const s = 16,
a = 65535;
function f(e) {
return { segmentId: e >>> s, localId: e & a };
}
(i.unpackModuleId = f),
(i.packModuleId = function (e) {
return (e.segmentId << s) + e.localId;
});
const p = [],
h = new Map();
function m(r, n) {
if (!n && p.length > 0) {
const e = h.get(r) ?? 0,
o = p[e];
null != o && (o(r), (n = t[r]), h.delete(r));
}
const o = e.nativeRequire;
if (!n && o) {
const { segmentId: e, localId: i } = f(r);
o(i, e), (n = t[r]);
}
if (!n) throw g(r);
if (n.hasError) throw w(r, n.error);
n.isInitialized = !0;
const { factory: c, dependencyMap: d } = n;
try {
const t = n.publicModule;
return (
(t.id = r),
c(e, i, l, u, t, t.exports, d),
(n.factory = void 0),
(n.dependencyMap = void 0),
t.exports
);
} catch (e) {
throw (
((n.hasError = !0),
(n.error = e),
(n.isInitialized = !1),
(n.publicModule.exports = void 0),
e)
);
}
}
function g(e) {
return Error('Requiring unknown module "' + e + '".');
}
function w(e, t) {
return Error(
'Requiring module "' + e + '", which threw an exception: ' + t,
);
}
})(
"undefined" != typeof globalThis
? globalThis
: "undefined" != typeof global
? global
: "undefined" != typeof window
? window
: this,
);
seems right so far for analyzing how the runtime works, its easier to create your own test bundles without minifying or to search in the source code of metro. also helpful to compare it with other bundles:
supporting multi-file bundles is not gonna be easy but can be added later, so for now it would be enough to focus on the __d(...)
calls of a single script.
example:
__d(
function (g, r, i, a, m, e, d) {
"use strict";
const t = r(d[0]).default || r(d[0]);
let c;
m.exports = () => c || ((c = t("locale")), c || "en");
},
"44cd5c",
["b2dff4"],
);
to
"use strict";
const t = require("./b2dff4.js").default || require("./b2dff4.js");
let c;
module.exports = () => c || ((c = t("locale")), c || "en");
the third parameter is a function that receives a module id, and returns its property importedDefault, which for what ive seen is a global variable, initialized an empty object {}, and for what iv've seen, it is not modified in this file. The fourth parameter, is similar to the third, a function that receives a module id and returns a property importedAll, which for every module, it is initially the same object as importedDefault.
they are used for import v from 'foo'
and import * as w from 'bar';
in ESM
Then there are some bundles that call the __r function directly, this i think are the entrypoints
yes.
apparently there can even be multiple top-level __r
calls: https://github.com/getsentry/sentry-cli/blob/844cee0d263204b0b2fb75688b58fd83f13b15b9/tests/integration/_fixtures/file-ram-bundle/index.android.bundle
I want to contribute with support for other bundlers. Right now i am working at a website which uses metro bundler. Which files in the project should i take into consideration for doing this? and would you consider this as something feasible to implement?