Investigate shorter squeeze symbols

musictheory / NilScript

Objective-C-style language superset of JavaScript with a tiny, simple runtime

Other

50 stars 5 forks source link

Investigate shorter squeeze symbols #116

Closed iccir closed 8 years ago

iccir commented 8 years ago

Right now, the squeezer generates identifiers which match the following regex: \$oj\$[A-Za-z][A-Za-z0-9]+

This results in an average length of 7 characters per identifier in our source base. It's necessary to have the $oj$ prefix to make symbolication of stack traces avoid false positives.

It occurred to me that [A-Za-z][0-9][A-Za-z] doesn't match any DOM APIs (although [A-Za-z][0-9] matches x1 and others in SVG).

Investigate using this prefix instead (all three characters must be present). Identifiers after index 27040 would use: [A-Za-z][0-9][A-Za-z][A-Za-z0-9]*

iccir commented 8 years ago

Using the current oj 2.0 branch, our test file (tenuto.js) is 336KB uncompressed and 75KB gzipped.

iccir commented 8 years ago

With the shorter squeeze symbols, tenuto.js is 302K uncompressed and 73KB gzipped. So a 10% reduction in uncompressed file size. Hmmm.

iccir commented 8 years ago

Tested with the following in OJSymbolTyper.js:

function sToBase52(index)
    let c0 = sBase52Digits.charAt(index % 52);  index = Math.floor(index / 52);
    let c1 = "" + (index % 10);                 index = Math.floor(index / 10);
    let c2 = sBase52Digits.charAt(index % 52);  index = Math.floor(index / 52);

    let result = c0 + c1 + c2;
    let base = 62;

    while (index > 0) {
        result += sBase52Digits.charAt(index % base);
        index = Math.floor(index / base);
    }

    return result;

iccir commented 8 years ago

One issue with this approach is that UglifyJS may also generate a symbol in the "a1a" format.

iccir commented 8 years ago

This would also change our internal tooling for symbolication. I don't believe this is worth it.