kamil-kielczewski / small-jsfuck

Generate small jsf code
8 stars 1 forks source link

Alternative coding #4

Closed kamil-kielczewski closed 4 years ago

kamil-kielczewski commented 4 years ago

Check octal coding idea proposed by jsfuck author aemkei here:

I like the idea of encoding the characters into numbers in a bootstrap to save space.

Have you thought about using octal sequences? This would save some some space per characters:

EG:

eval(eval("'91419154914591629164950961951'".replace(/9/g, "\"))) The bootstrap code is ~25k but maybe we can save some bytes by replacing the quotes or backspace.

'\141\154\145\162\164\50\61\51' in chrome console gives "alert(1)"

Check if this works with emoji/Chinese letters

kamil-kielczewski commented 4 years ago
  1. This technique "\n" works when n is octal (base8) number form (0-377) (decimal: 0-255) - below code shows all caracters
[...Array(256)].map((x,i)=> `${i.toString(8)} ` +eval(`"\\${i.toString(8)}"`) )

As we see in that coding

  1. I write this tool to make statistics and calc proportion of 3char codes to 4char codes (100%=only 3char codes, 0%=only 4char codes) - if closer to 100% then more profit we get from switching base4 to base8

Results for example libs (average prop=33%)

Results for example minified libs (average prop=24%)

So minified lib have 24% of characters which can by write using 3 characters in base8. Non minified libs have 33% (probably due to many white-chars). But we can assume that users usually will convert minified code to get jsfuck version (because it is smaller).

  1. So in base8 we have 9 characters (0-7 and slash) for which we want to find shortest jsfuck representations - this is my proposition for this (i add + before each resentation because it must be used to concat with rest part of the string)

    0 -> +(+!![])           //         1 ( 8 chars)
    1 -> +!![]              //      true ( 5 chars)
    2 -> +(+[])             //         0 ( 6 chars)
    3 -> +[][[]]            // undefined ( 7 chars)
    4 -> +(+[![]])          //       NaN ( 9 chars)
    5 -> +(!![]+!![])       //         2 (12 chars)
    6 -> +(![]+[])[+![]]    //         f (15 chars)
    7 -> +(!![]+[])[+![]]   //         t (16 chars)` 
    8 -> +![]               //     false ( 4 chars)
    • I choose shortest jsf representation for backslash \(8) (because it appear before each char).
    • I use second short jsf code to number 1 because 4characters base8 (which appears in >75% of minified code) starts always by 1 (codes >=200 are useless in typical app source-code))
  2. Character comparison base8 vs base4 with this tool - in this tool we use optimalized base4 map (in same way like for base8) details here #1 ). After comparison it is clear that base8 have shorter codes only for this 7 characters !"#*+JK - however only 5 of them !"#*+ have 3char base8 representation (which give us profit).

  3. Conclusion: Lets assume we have ~95 critical ASCII characters set used in typical code (ASCII dec code 32-127). For this set only 5 characters !"#*+ chave 3char base8 representation and gieves us profit (others gives no profit or loose). 3char base8 representation exist in about ~25% of minimized typical libraries code and in ASCII we have ~32 characters with 3char base8 representation - this 15%. So we have 15% * 25% = 5% of input code we have profit, for 95% we have no profit or loose. It is not worth to implement this. You can test it by yourself by typing code in this tool

kamil-kielczewski commented 4 years ago

Here: #6 is more promising modification of this approach wich can be checked and may be implemented in near future