tc39 / proposal-reversible-string-split

String.prototype.splitn for ECMAScript
MIT License
82 stars 4 forks source link

what are some examples of things that are difficult to do with the existing split method? #7

Open michaelficarra opened 2 years ago

michaelficarra commented 2 years ago

I'm struggling with the motivation for this proposal. It seems like it is a solution in search of a problem. "Other programming languages do it this way" is not significantly motivating if there's an easy way to accomplish the task with the existing method. Do you have any examples where it's hard to do it with what we provide today? Is it just about increased readability? Can you show what a polyfill of your proposed method would look like?

lucacasonato commented 2 years ago

Currently "prefix splits" are rather annoying to do. Here is an example:

// test.ini
/*
key = value
other_key = 'value contains an = sign'
*/

// Current JavaScript
const ini = Deno.readTextFileSync("./test.ini");
const entries = ini.split("\n").map((line) => {
  const [key, ...rest] = line.split("=");
  return [key, rest.join("=")];
});

// With this proposal
const ini = Deno.readTextFileSync("./test.ini");
const entries = ini.split("\n").map((line) => line.splitn("=", 2));

This type of splitting is very common. It happens while parsing cookies, mime types, ini files, http 1.1 headers, and many more.

As for a polyfill, there are multiple possible implementations. Either based on the existing String.prototype.split or alternatively using String.prototype.indexOf. If you want me to elaborate on the actual implementation of that polyfill, please let me know, but I think it is relatively straight-forward.

michaelficarra commented 2 years ago

Yes, can you elaborate on it? I want to know how difficult it would be to write this helper where needed.

lucacasonato commented 2 years ago

An implementation of splitn as proposed.

function splitn(str, sep, n) {
  if (n < 1) {
    throw new TypeError("n must be a positive non-zero integer");
  }
  const res = [];
  let start = 0;
  for (let i = 0; i < n - 1; i++) {
    const index = str.indexOf(sep, start);
    if (index === -1) {
      break;
    }
    res.push(str.slice(start, index));
    start = index + sep.length;
  }
  res.push(str.slice(start));
  return res;
}

const str = "a|b|c|d|e|f";
const sep = "|";
console.log(splitn(str, sep, 2));
console.log(splitn(str, sep, 1));
console.log(splitn(str, sep, 5));
console.log(splitn(str, sep, 6));

https://gist.githubusercontent.com/lucacasonato/ed218a61e617c51ed8f19d3bf6b9d89a/raw/ae7a295c5e300e08d6a284017cfe2122a0e99869/splitn.js

lucacasonato commented 2 years ago

Nothing is particularly difficult to do with the existing String.prototype.split. The split + join approach shown above works, but is really not great, especially for longer strings. The implementation using indexOf is much more efficient, but also much more involved to write.

Having this as a first class primitive would make the easy approach also the "correct" (fast + efficient) approach.

theScottyJam commented 2 years ago

I'd also point out that the split + join solution doesn't work if you're using a regex to do the split - a regex split will cause information to be lost (what specific characters were used in the separators), and that can't be recovered.

So, if you want to do something trivial, like, remove the first line of a file (perhaps you want to get rid of the she-bang at the top, or something), you could do so in a platform-independent way easily with this proposal as follows:

fileContents.splitn(/\r?\n/, 2)[1]
hax commented 2 years ago

I have spoken that on the meeting, add here for other to know.

I was frustrated by split N times 🤪 , the main use case is parsing some special format string, which start with some repeat pattern n times, and need to deal with the reminder with further pasing. "prefix splits" is the special (and the most common) case of it, and consume n lines which @theScottyJam give the example is definitely another common case.

split + join is not suitable for such parsing case, for potential long string, it just waste cpu/mem. And as @theScottyJam said it even worse or impossible for regex split.