Alternative coding - Githubissues

kamil-kielczewski commented 4 years ago

Check octal coding idea proposed by jsfuck author aemkei here:

I like the idea of encoding the characters into numbers in a bootstrap to save space.

Have you thought about using octal sequences? This would save some some space per characters:

EG:

eval(eval("'91419154914591629164950961951'".replace(/9/g, "\"))) The bootstrap code is ~25k but maybe we can save some bytes by replacing the quotes or backspace.

'\141\154\145\162\164\50\61\51' in chrome console gives "alert(1)"

Check if this works with emoji/Chinese letters

kamil-kielczewski commented 4 years ago

This technique "\n" works when n is octal (base8) number form (0-377) (decimal: 0-255) - below code shows all caracters

[...Array(256)].map((x,i)=> `${i.toString(8)} ` +eval(`"\\${i.toString(8)}"`) )

As we see in that coding

letter must be replaced by 4 characters (3 digits+slash) e.g. "\101\132\141\172" gives "AZaz",
special1 char replaced by 4 characters (3digits+slash) e.g. "\100\133\134\135\137\140\173\174\175" gives @[\]_{|}`
number must be replaced by 3 characters (2 digits+slash) e.g. "\60\61\62\71"gives "0129",
special2 char replaced by 3 chars (2digit+slash): "\41\42\44\45\46\47\50\51\52\53\54\55\56\57\12\40" gives "!"$%&'()*+,-./ " and new line

I write this tool to make statistics and calc proportion of 3char codes to 4char codes (100%=only 3char codes, 0%=only 4char codes) - if closer to 100% then more profit we get from switching base4 to base8

Results for example libs (average prop=33%)

react prop: 28% -> {"letters":40767,"spec1":1479,"numbers":140,"spec2":16750,"total":60644}
jquery prop: 29% -> {"letters":173935,"spec1":7136,"numbers":2170,"spec2":71996,"total":287629}
rxjs prop: 43% -> {"letters":184207,"spec1":8584,"numbers":832,"spec2":147876,"total":351229}
d3.js prop: 41% -> {"letters":294070,"spec1":15719,"numbers":23163,"spec2":193491,"total":550259}
charjs prop: 32% -> {"letters":345517,"spec1":15579,"numbers":9908,"spec2":162666,"total":578944}
three.jsprop: 27% -> {"letters":788424,"spec1":27764,"numbers":23129,"spec2":279741,"total":1263667}

Results for example minified libs (average prop=24%)

react min prop: 22% -> {"letters":4500,"spec1":255,"numbers":128,"spec2":1275,"total":6674}
jquery min prop: 26% -> {"letters":56760,"spec1":4985,"numbers":1170,"spec2":20574,"total":89475}
vuejs min prop: 26% -> {"letters":59605,"spec1":4747,"numbers":955,"spec2":22446,"total":93670}
rxjs min prop: 19% -> {"letters":91976,"spec1":5195,"numbers":884,"spec2":22165,"total":127570}
charjs min prop: 25% -> {"letters":147456,"spec1":10746,"numbers":7528,"spec2":46665,"total":226226}
d3.js min prop: 30% -> {"letters":157354,"spec1":13965,"numbers":17320,"spec2":58868,"total":265487}
three.js minprop: 20% -> {"letters":453285,"spec1":19428,"numbers":18070,"spec2":106569,"total":642740}

So minified lib have 24% of characters which can by write using 3 characters in base8. Non minified libs have 33% (probably due to many white-chars). But we can assume that users usually will convert minified code to get jsfuck version (because it is smaller).

So in base8 we have 9 characters (0-7 and slash) for which we want to find shortest jsfuck representations - this is my proposition for this (i add + before each resentation because it must be used to concat with rest part of the string)
```
0 -> +(+!![])           //         1 ( 8 chars)
1 -> +!![]              //      true ( 5 chars)
2 -> +(+[])             //         0 ( 6 chars)
3 -> +[][[]]            // undefined ( 7 chars)
4 -> +(+[![]])          //       NaN ( 9 chars)
5 -> +(!![]+!![])       //         2 (12 chars)
6 -> +(![]+[])[+![]]    //         f (15 chars)
7 -> +(!![]+[])[+![]]   //         t (16 chars)` 
8 -> +![]               //     false ( 4 chars)
```
- I choose shortest jsf representation for backslash \(8) (because it appear before each char).
- I use second short jsf code to number 1 because 4characters base8 (which appears in >75% of minified code) starts always by 1 (codes >=200 are useless in typical app source-code))
Character comparison base8 vs base4 with this tool - in this tool we use optimalized base4 map (in same way like for base8) details here #1 ). After comparison it is clear that base8 have shorter codes only for this 7 characters !"#*+JK - however only 5 of them !"#*+ have 3char base8 representation (which give us profit).
Conclusion: Lets assume we have ~95 critical ASCII characters set used in typical code (ASCII dec code 32-127). For this set only 5 characters !"#*+ chave 3char base8 representation and gieves us profit (others gives no profit or loose). 3char base8 representation exist in about ~25% of minimized typical libraries code and in ASCII we have ~32 characters with 3char base8 representation - this 15%. So we have 15% * 25% = 5% of input code we have profit, for 95% we have no profit or loose. It is not worth to implement this. You can test it by yourself by typing code in this tool

kamil-kielczewski commented 4 years ago

Here: #6 is more promising modification of this approach wich can be checked and may be implemented in near future

kamil-kielczewski / small-jsfuck

Alternative coding #4