Fix all uses of String.prototype.substr

hildjj commented 1 year ago

See the deprecation warning on MDN.

It looks like we can substitute String.prototype.slice, including in the generated code. Think about edge cases of NaN, negative, and swapped start and end in all cases.

reverofevil commented 1 year ago

Why? Half the web depends on it, and it's often more convenient for parsers than slice or substring. At the very least it shouldn't block any releases.

hildjj commented 1 year ago

I'm not sure why it's more convenient than slice or substring? Can you say more?

reverofevil commented 1 year ago

To check that a contains b starting with a certain position (with correct handling of potential end of string) is usually done as a.substr(peg$pos, b.length) === b. The other two methods take the end position instead of length, so with no substr it should be computed as peg$pos + b.length for no good reason.

As I understand, substr will be there forever in JS implementations along with ordering of fields in objects, <marquee> tag, with operator and quirky behavior of sort, because

otherwise early websites would stop operating;
there is no graceful degradation in case it ceases to exist;
removing it from an implementation of a JS engine produces barely any profit;
prototypes of globals can't really change in strict mode.

TLDR I don't see how following this deprecation might improve anything.

markw65 commented 1 year ago

Not sure if this was just abandoned on the grounds that substr will never be removed, but I did some investigation.

the only place substr is used now is in generated code (and one reference from docs/vendor/codemirror)
if I add a helper function, peg$substr at the same level as peg$currPos that just takes a length parameter, and returns input.slice(peg$currPos, peg$currPos + length), and use that instead of input.substr everywhere, the generated parser is smaller (peg$substr(5) vs input.substr(peg$currPos, 5)), and I can't measure a difference in performance in any of the parsers that I've generated (and I've tried with parser/input combos that take hundreds of ms).
I wrote a small test case that sets up something comparable to the parser, and just calls my new substr, or input.substr in a loop, and there I see a very small slowdown. Maybe as much as 10%, but there's so much noise it's hard to tell (on some runs my helper is actually faster, but its slower more often than not).

In summary, it seems like switching over is easy, there's a code size win (not a big deal), and maybe a very small performance penalty, but not measurable in real cases.

Here's the test case, in case I'm making a systematic error of some sort:

const fs = require("node:fs/promises");

function wrapper(input) {
  let currPos = 0;
  function substr(n) {
    return input.slice(currPos, currPos + n);
  }

  function parseOld() {
    currPos = 0;
    while (input.substr(currPos, 5) !== "") {
      currPos += 5;
    }
  }
  function parseNew() {
    currPos = 0;
    while (substr(5) !== "") {
      currPos += 5;
    }
  }
  return [parseOld, parseNew];
}

fs.readFile("<a 20k text file>").then(
  (data) => {
    const [parseOld, parseNew] = wrapper(data.toString());
    function test(testFn, which) {
      const start = Date.now();
      for (let i = 0; i < 10000; i++) {
        testFn();
      }
      const end = Date.now();
      console.log(`${which} Took ${end - start}ms`);
    }
    test(parseOld, "parseOld");
    test(parseNew, "parseNew");
  }
);

mikeaustin commented 11 months ago

I agree that substr() will live forever, but there are ways of writing "modern" JavaScript. .slice() is mirrored by Array, which makes it more consistent. You can still use "var" and "==" all you want, but I wouldn't allow it in a recent PR. I think modernizing a code-base is a good goal to have. It's just confusing to have 3 functions that basically do the same thing.

https://stackoverflow.com/questions/2243824/what-is-the-difference-between-string-slice-and-string-substring

peggyjs / peggy

Fix all uses of String.prototype.substr #408