j4k0xb / webcrack

Deobfuscate obfuscator.io, unminify and unpack bundled javascript
https://webcrack.netlify.app
MIT License
1.07k stars 129 forks source link

Unknown obfuscation not handled #76

Open 0xdevalias opened 6 months ago

0xdevalias commented 6 months ago

Found this obfuscated JS on a site today; the web version of webcrack didn't seem able to demangle it very well:

sec-cpt-4-4.js.txt

Not sure if relevant, but localStorage has this in it:

image

0xdevalias commented 6 months ago

Another example from the same site, unsure if using the same method:

WiNOMy0.js.txt

0xdevalias commented 6 months ago

Vaguely manually skimming through parts of WiNOMy0.js.txt at the moment, aside from basic wrapper functions to obfuscate things/etc (and me renaming them); the first function called within the IIFE ends up at this, which seems to be processing the .toString of the main IIFE, then extracting parts out of it:

function Jn() {
  In = QVn(toString(mainIIFE), "rTcEYQdbXf", "60d96a3");
}

function QVn(mainIIFEAsString, s_rTcEYQdbXf, s_60d96a3) {
  var SVn = indexOf(mainIIFEAsString, "0x" + s_60d96a3);
  var gVn = indexOf(mainIIFEAsString, ";", SVn);
  var LVn = SVn + length(s_60d96a3) + 3;
  var EVn = substr(mainIIFEAsString, LVn, gVn - LVn);
  var cVn = substr(mainIIFEAsString, 0, SVn);
  var xVn = substr(mainIIFEAsString, gVn + 1);
  var YVn = cVn + xVn + typeof nn[s_rTcEYQdbXf];
  var IVn = WVn(YVn, 435614);
  return EVn - IVn;
}

function WVn(jVn, tVn) {
  var CVn = tVn;
  var RVn = 3432918353;
  var OVn = 461845907;
  var PVn = 0;
  for (var rVn = 0; rVn < length(jVn); ++rVn) {
    var wVn = kVn(jVn, rVn);
    if (wVn === 10 || wVn === 13 || wVn === 32) {
      continue;
    }
    wVn = (wVn & 65535) * RVn + (((wVn >>> 16) * RVn & 65535) << 16) & 4294967295;
    wVn = wVn << 15 | wVn >>> 17;
    wVn = (wVn & 65535) * OVn + (((wVn >>> 16) * OVn & 65535) << 16) & 4294967295;
    CVn ^= wVn;
    CVn = CVn << 13 | CVn >>> 19;
    var UVn = (CVn & 65535) * 5 + (((CVn >>> 16) * 5 & 65535) << 16) & 4294967295;
    CVn = (UVn & 65535) + 27492 + (((UVn >>> 16) + 58964 & 65535) << 16);
    ++PVn;
  }
  CVn ^= PVn;
  CVn ^= CVn >>> 16;
  CVn = (CVn & 65535) * 2246822507 + (((CVn >>> 16) * 2246822507 & 65535) << 16) & 4294967295;
  CVn ^= CVn >>> 13;
  CVn = (CVn & 65535) * 3266489909 + (((CVn >>> 16) * 3266489909 & 65535) << 16) & 4294967295;
  CVn ^= CVn >>> 16;
  return CVn >>> 0;
}

In QVn, it basically works out to be:

s_rTcEYQdbXf = "rTcEYQdbXf"
s_60d96a3 = "60d96a3"

var index1 = mainIIFEString.indexOf("0x" + s_60d96a3);
var index2 = mainIIFEString.indexOf(";", index1);
var index3 = index1 + s_60d96a3.length + 3;
var substr1 = mainIIFEString.substr(index3, index2 - index3);
var substr2 = mainIIFEString.substr(0, index1);
var substr3 = mainIIFEString.substr(index2 + 1);
//var YVn = substr2 + substr3 + typeof nn[s_rTcEYQdbXf];
//var IVn = unknownComplicatedCalcFunc(YVn, 435614);
// return index4 - IVn
console.log({ index1, index2, index3, substr1, substr2, substr3 })

The math / processing gets a bit heavy there, so I decided to start looking at the 2nd function call in the IIFE, which seems to use brainf*ck-esque patterns to calculate integers (the comments are mine):

function zVn() {
  int_5 = +!+[] + !+[] + !+[] + !+[] + !+[];               // 5
  int_2 = !+[] + !+[];                                     // 2
  int_10 = [+!+[]] + [+[]] - [];                           // 10
  int_4 = !+[] + !+[] + !+[] + !+[];                       // 4
  int_3 = +!+[] + !+[] + !+[];                             // 3
  int_9 = [+!+[]] + [+[]] - +!+[];                         // 9
  int_7 = +!+[] + !+[] + !+[] + !+[] + !+[] + !+[] + !+[]; // 7
  int_8 = [+!+[]] + [+[]] - +!+[] - +!+[];                 // 8 
  int_6 = +!+[] + !+[] + !+[] + !+[] + !+[] + !+[];        // 6
  int_0 = +[];                                             // 0
  int_1 = +!+[];                                           // 1
}

Which then seem to be processed in more calculations in the 3rd main function called within the IIFE:

function FVn() {
  tD = int_9 + int_10 + int_3 * int_10 * int_10 + int_10 * int_10 * int_10;
  Qw = int_6 + int_8 * int_10 + int_2 * int_10 * int_10 + int_10 * int_10 * int_10;
  bv = int_1 + int_6 * int_10 + int_4 * int_10 * int_10 + int_10 * int_10 * int_10;
  AR = int_9 + int_4 * int_10;
  qk = int_0 + int_4 * int_10 + int_6 * int_10 * int_10 + int_10 * int_10 * int_10;
  b9 = int_9 + int_5 * int_10 + int_6 * int_10 * int_10;
  zt = int_2 + int_8 * int_10 + int_7 * int_10 * int_10;
  // ..snip..
  d0 = int_5 + int_0 * int_10 + int_6 * int_10 * int_10;
  gO = int_6 + int_3 * int_10 + int_2 * int_10 * int_10;
  Tq = int_3 + int_9 * int_10 + int_6 * int_10 * int_10 + int_10 * int_10 * int_10;
  Dr = int_6 + int_6 * int_10;
}

The 4th IIFE function call basically wraps one of the integers calculated there in an array, which is then just unwrapped again later, so is used as indirection:

var nVNArray = wrapnVnAsArray();

function wrapnVnAsArray() {
  return [int_1557045567];
}

// Accessed later like this:
//   if (Fn.En[int_0] > int_0) {
//    VA(L3[int_0] - nVNArray[int_0]);
//   }

The 5th IIFE function call is just setting another variable; which I think gets used as indirection again later:

function fln() {
  HLn = ["qC"];
}

The 6th IIFE call does some more wrapping stuff, where xT seems to be one of the numbers calculated in the 3rd IIFE function:

function Gln() {
  ZLn = [xT];
}

The 7th IIFE call is another array wrapped string:

function mln() {
  Mdn = ["tj"];
}

The 8th IIFE call is another array wrapper:

function Nln() {
  return [Gj];
}

etc

Skimming down through it, there are a lot of other function definitions, and then seemingly buried among those there is a raw call to this (as part of the main IIFE)

return VA.call(this, GC);

Even if it's not possible (or worthwhile) to fully de-obfuscate this; I wonder if some of the patterns used within it might be possible to develop rules for, which might simplify/reduce some of the noise of things along the way and make other aspects more easy/obvious.

j4k0xb commented 6 months ago

It's only designed to deobfuscate javascript-obfuscator and these scripts looks very different.

I wonder if some of the patterns used within it might be possible to develop rules for

E.g. simplifying string/number expressions can work in general, but the majority of transformations have to made for a specific obfuscator and have unsafe assumptions that would break other scripts Here are projects that try to support many different ones: https://github.com/PerimeterX/restringer, https://github.com/ben-sb/javascript-deobfuscator

Instead I'd rather add more interactive actions that make manually working on unknown obfuscators faster and let the user decide if its safe Currently (multi cursor support): image Planned:

0xdevalias commented 6 months ago

It's only designed to deobfuscate javascript-obfuscator and these scripts looks very different.

@j4k0xb Oh, true. I thought the plan was to be a general deobfuscator. My bad.


E.g. simplifying string/number expressions can work in general, but the majority of transformations have to made for a specific obfuscator and have unsafe assumptions that would break other scripts

@j4k0xb Yeah, that makes sense.

I don't know how universally it would apply, but one of the things I was thinking about in terms of this script (which I think probably should work for others without breaking things), is to look at how many times/places a function/variable is used, and if it's only once, potentially inline it. Obviously the 'devil is in the detail' of the nuance of how that's implemented though I guess.. and might still be hard to genericise.

The other similar idea I had was in either inlining, or at least automatically renaming little 'gadget functions'; eg. the ones where they're just a function wrapper around .length or .indexOf or similar.


Here are projects that try to support many different ones: PerimeterX/restringer, ben-sb/javascript-deobfuscator

@j4k0xb Oh, sweet; will definitely have to check those out. Thanks!


I'd rather add more interactive actions that make manually working on unknown obfuscators faster and let the user decide if its safe

@j4k0xb Interesting.. yeah, I can definitely see a lot of value/flexibility in that workflow. Though I guess that approach wouldn't be incompatible with what I was thinking above either; some of those 'smaller fixes' could be implemented in the 'manual toolkit' way rather than as part of fully automated unpacking.


Manually add code (e.g. functions) that will be executed alongside the eval/replace action

@j4k0xb Interesting.. will that be just like raw JS? Or like, functions that operate on an AST/etc?

0xdevalias commented 6 months ago

Here are projects that try to support many different ones: PerimeterX/restringer, ben-sb/javascript-deobfuscator

Instead I'd rather add more interactive actions that make manually working on unknown obfuscators faster and let the user decide if its safe

Linked from that restringer repo, I came across this project:

It could be cool to have a similar sort of 'obfuscation detector' feature within webcrack, particularly if it was paired with the 'interactive actions'. The 'detector' rules could suggest which obfuscations seem to be in place, and could then potentially recommend corresponding rules, etc.


Unfortunately the obfuscation detector didn't seem too useful on my above examples:

⇒ npx obfuscation-detector ~/Desktop/sec-cpt-4-4.js.txt
[-] No obfuscation detected / unknown obfuscation

⇒ npx obfuscation-detector ~/Desktop/WiNOMy0.js.txt
[-] No obfuscation detected / unknown obfuscation

I tried https://github.com/ben-sb/javascript-deobfuscator , but it didn't seem to do too well, and got an error while trying to use its 'Replace Proxy Functions':


Also tried https://github.com/PerimeterX/restringer , which is still running, but at least from the output log lines it definitely sounds like it's doing something (will have to wait and see what the final output looks like to know if that's something useful or not)

Running in -v (verbose) mode, I can see that it seems to spend a LOT of time in things like the following:

Which seem to end up with a lot of errors like this:

[-] Error in _evalInVm: j8n is not defined
[-] Error in _evalInVm: Xnn is not defined
[-] Error in _evalInVm: bnn is not defined
[-] Error in _evalInVm: GXn is not defined

Looks like it took ~95min to run through the full deobfuscation loop:

[!] REstringer v1.10.2
[!] Deobfuscating /Users/devalias/Desktop/WiNOMy0.js.txt...
[+] Obfuscation type is Generic
..snip..
[+] ==> Cycle 1 completed in 387.071 seconds with 987 changes (95039 nodes)
..snip..
[+] ==> Cycle 2 completed in 1053.753 seconds with 2475 changes (95998 nodes)
..snip..
[+] ==> Cycle 3 completed in 961.102 seconds with 23 changes (96799 nodes)
..snip..
[+] ==> Cycle 4 completed in 970.483 seconds with 1 changes (96800 nodes)
..snip..
[+] ==> Cycle 5 completed in 1125.627 seconds with 1 changes (96805 nodes)
..snip..
[+] ==> Cycle 6 completed in 1008.245 seconds with 1 changes (96807 nodes)
..snip..
[+] ==> Cycle 7 completed in 70.284 seconds with no changes (96807 nodes)
..snip..
[+] ==> Cycle 8 completed in 91.749 seconds with no changes (96807 nodes)
  [!] Running normalizeComputed...
  [+] normalizeComputed committed 9 new changes!
    [!] Running normalizeComputed completed in 2.473 seconds
  [!] Running normalizeRedundantNotOperator...
[-] Error in _evalInVm: nn is not defined
  [+] normalizeRedundantNotOperator committed 9 new changes!
    [!] Running normalizeRedundantNotOperator completed in 2.139 seconds
  [!] Running normalizeEmptyStatements...
    [!] Running normalizeEmptyStatements completed in 0.008 seconds
[+] ==> Cycle 9 completed in 4.632 seconds with 18 changes (96798 nodes)
  [!] Running normalizeComputed...
    [!] Running normalizeComputed completed in 0.015 seconds
  [!] Running normalizeRedundantNotOperator...
    [!] Running normalizeRedundantNotOperator completed in 0.029 seconds
  [!] Running normalizeEmptyStatements...
    [!] Running normalizeEmptyStatements completed in 0.009 seconds
[+] ==> Cycle 10 completed in 0.064 seconds with no changes (96798 nodes)
[+] Saved /Users/devalias/Desktop/WiNOMy0.js.deobfs.txt
[!] Deobfuscation took 5702.785 seconds.

And even after all of that, I don't think it really ended up being very useful:

I wish it was possible to Ctrl-C at an arbitrary point and have it output it's current 'iteration'; but seemingly that's not possible, and you just lose everything.

I suspect these run so slowly because they're 'unsafe' and thus being evaluated in isolated-vm: