peggyjs / peggy

Peggy: Parser generator for JavaScript
https://peggyjs.org/
MIT License
914 stars 65 forks source link

Fix all uses of String.prototype.substr #408

Open hildjj opened 1 year ago

hildjj commented 1 year ago

See the deprecation warning on MDN.

It looks like we can substitute String.prototype.slice, including in the generated code. Think about edge cases of NaN, negative, and swapped start and end in all cases.

reverofevil commented 1 year ago

Why? Half the web depends on it, and it's often more convenient for parsers than slice or substring. At the very least it shouldn't block any releases.

hildjj commented 1 year ago

I'm not sure why it's more convenient than slice or substring? Can you say more?

reverofevil commented 1 year ago

To check that a contains b starting with a certain position (with correct handling of potential end of string) is usually done as a.substr(peg$pos, b.length) === b. The other two methods take the end position instead of length, so with no substr it should be computed as peg$pos + b.length for no good reason.

As I understand, substr will be there forever in JS implementations along with ordering of fields in objects, <marquee> tag, with operator and quirky behavior of sort, because

TLDR I don't see how following this deprecation might improve anything.

markw65 commented 1 year ago

Not sure if this was just abandoned on the grounds that substr will never be removed, but I did some investigation.

In summary, it seems like switching over is easy, there's a code size win (not a big deal), and maybe a very small performance penalty, but not measurable in real cases.

Here's the test case, in case I'm making a systematic error of some sort:

const fs = require("node:fs/promises");

function wrapper(input) {
  let currPos = 0;
  function substr(n) {
    return input.slice(currPos, currPos + n);
  }

  function parseOld() {
    currPos = 0;
    while (input.substr(currPos, 5) !== "") {
      currPos += 5;
    }
  }
  function parseNew() {
    currPos = 0;
    while (substr(5) !== "") {
      currPos += 5;
    }
  }
  return [parseOld, parseNew];
}

fs.readFile("<a 20k text file>").then(
  (data) => {
    const [parseOld, parseNew] = wrapper(data.toString());
    function test(testFn, which) {
      const start = Date.now();
      for (let i = 0; i < 10000; i++) {
        testFn();
      }
      const end = Date.now();
      console.log(`${which} Took ${end - start}ms`);
    }
    test(parseOld, "parseOld");
    test(parseNew, "parseNew");
  }
);
mikeaustin commented 11 months ago

I agree that substr() will live forever, but there are ways of writing "modern" JavaScript. .slice() is mirrored by Array, which makes it more consistent. You can still use "var" and "==" all you want, but I wouldn't allow it in a recent PR. I think modernizing a code-base is a good goal to have. It's just confusing to have 3 functions that basically do the same thing.

https://stackoverflow.com/questions/2243824/what-is-the-difference-between-string-slice-and-string-substring