Slice Extensibility - Githubissues

rbuckton commented 6 years ago

It would be great if you could specify how the slice notation should apply to an object, perhaps via a Symbol.slice:

interface Array<T> {
  [Symbol.slice](start?: number, end?: number, step?: number): Array<T>;
}
interface Int32Array {
  [Symbol.slice](start?: number, end?: number, step?: number): Int32Array;
}
// etc.
interface String {
  [Symbol.slice](start?: number, end?: number, step?: number): string;
}

Then syntax like this:

array[1:3:2]

Becomes this at runtime:

array[Symbol.slice](1, 3, 2)

The advantage of this is that we can specify the syntax in terms of a method, which allows us to specify the behavior of the slice notation on strings to work over code points rather than characters, and the behavior of the slice notation on typed arrays.

In addition, users can define how the slice notation applies to their own classes:

// slice on custom class
class Vector {
  ...
  [Symbol.slice](start, end, step) {
    ...
  }
}

// other interesting use cases
class Range {
  constructor(start, end, step) {
    this.start= start;
    this.end = end;
    this.step = step;
  }
  apply(obj) {
    return obj[Symbol.slice](this.start, this.end, this.step);
  }
  static [Symbol.slice](start, end, step) {
    return new Range(start, end, step);
  }
}

let range = Range[1:3:2];
range.start; // 1
range.end; // 3
range.step; // 2

littledan commented 6 years ago

Becomes this at runtime: array[Symbol.slice](1:3:2)

Was this meant to be array[Symbol.slice](1, 3, 2)?

rbuckton commented 6 years ago

Yes, thanks. I've updated the issue.

ljharb commented 6 years ago

"slice" to me makes no sense as a concept applied to things that aren't lists (such as arrays, strings, Sets) or things without indexes.

If we want a generic extraction API, we should call it something else, and it shouldn't solely use numbers.

rbuckton commented 6 years ago

@ljharb we can bikeshed on Symbol.slice, but my point is that Array, String, and Set aren't necessarily the only "list"-like things in JavaScript, as users can define their own "list"-like classes that would like to use this feature. The name Symbol.slice was chosen in this case as the proposal defines this as "slice notation".

rbuckton commented 6 years ago

I wonder if we might want to dust off the Symbol.geti/Symbol.seti proposal as well, and consider adding a Range primitive with literal syntax:

// built-in `Range` class
class Range {
  constructor(start = 0, end = -1, step = 1) {
    this.start = start;
    this.end = end;
    this.step = step;
  }
  [Symbol.geti](obj) {
    return obj[Symbol.slice](this.start, this.end, this.step);
  }
  [Symbol.seti](obj, values) {
    return obj[Symbol.splice](this.start, this.end, values);
  }
}

// Literal `Range` syntax:
let range = 1:3; 
// -> range = new Range(1, 3);

// Get a range
let source = [1, 2, 3, 4, 5];
let chunk = source[range];
// -> chunk = range[Symbol.geti](source);
// -> chunk = source[Symbol.slice](1, 3, 1);
// -> chunk = [2, 3]

source[range] = [7, 8, 9];
// -> range[Symbol.seti](source, [7, 8, 9])
// -> source[Symbol.splice](1, 3, [7, 8, 9])

console.log(source); // 1, 7, 8, 9, 4, 5

While there would definitely be some indirection under the covers, its very flexible, consistent, and cohesive.

One caveat is that a literal range syntax would be ambiguous in a conditional, so you would have to require parens for a literal range expression (e.g. x ? (1:2) : (3:4)).

caub commented 6 years ago

@rbuckton nice idea

Most other languages have that start : end[ : step] syntax (sometimes start[ : step]: end), but I find the step argument not very useful. Replacing it by a callback would have benefits (performance, ..) even if it looks weird at first glance

1:9:2 would become 0:4:i=>1+i*2 (1:10).map(() => 100*Math.random()) would become 1:10:() => 100*Math.random()

rbuckton commented 6 years ago

1:9:2 would become 0:4:i=>1+i*2

This seems like it would be too complicated for the array selector case, compared to this:

const ints = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
const odds = ints[0::2]; // [1, 3, 5, 7, 9];
const events = ints[1::2]; // [2, 4, 6, 8, 10];

Besides, you could already map with Array.from:

const odds = Array.from([0:5], i => (i * 2) + 1);

gsathya commented 6 years ago

I wonder if we might want to dust off the Symbol.geti/Symbol.seti proposal as well, and consider adding a Range primitive with literal syntax:

The problem with adding a new Range primitive is that would complicate GetValue/PutValue, regressing performance for all property access.

The win with just the slice notation is that it's just syntax which can be directly rewritten in the parser to be a call out to Symbol.slice and we can reuse all the magic sauce we have with optimizing regular property access. You only pay for call out to Symbol.slice if you use slice notation, not every property access. Also, since this is just syntax, we can easily optimize this with ICs.

rbuckton commented 6 years ago

The problem with adding a new Range primitive is that would complicate GetValue/PutValue, regressing performance for all property access.

Hosts like v8 and Chakra already optimize property access and have opt-outs for non-PropertyKey values (e.g. obj.foo is fast while obj[{ toString() { return "foo"; } }] is slow, but both work).

gsathya commented 6 years ago

There's always at least an extra type check (load + jump) required to bailout on the fast path.

caub commented 6 years ago

also Range name is taken https://developer.mozilla.org/en-US/docs/Web/API/Range by text selection API

and document.createRange as well

littledan commented 6 years ago

In addition to the issues Sathya raised, it seems somewhat complicated grammatically to give : yet another meaning outside of a somewhat restricted context, given its other usages.

caub commented 6 years ago

This possible new meaning for : would be restricted inside array literals notation, I don"t think it complicate things much for the parser, does it?

@rbuckton I think it's very frequent to need a .map just after (or Array.from like you said, but it's quite verbose) than this step parameter. That step parameter is just like a .filter in less powerful too. But in my proposal, it'd be confusing to pass a function expression as 3rd parameter, above all if the first 2 only accept number literals (#26), so I'm fine with this [start:end:step] after all

gsathya commented 6 years ago

This possible new meaning for : would be restricted inside array literals notation, I don"t think it complicate things much for the parser, does it?

That's what I'm proposing, but not what @rbuckton seems to want according to https://github.com/tc39/proposal-slice-notation/issues/19#issuecomment-415995994.

rbuckton commented 6 years ago

@gsathya: I assume you are referring to this: let range = 1:3? In effect I'm saying it would be a "nice to have". If we ever did decide to add https://github.com/tc39/proposal-slice-notation/issues/19#issuecomment-415995994, it could be achieved as a series of follow-on proposals:

Main Proposal:
- Add syntax for slice notation in an element-access position: a[start:end:step].
Follow-on Proposals:
- Allow slice notation to be extensible via @@slice and @@splice:
- a[1:3] -> a[@@slice](1, 3, 1)
- a[1:3] = b -> a[@@splice](1, 3, b)
- Add support for @@geti and @@seti:
- a[x] -> x[@@geti](a)
- a[x] = b -> x[@@seti](a, b)
- Add syntax for range literals (e.g. 1:3):
  1. a[1:3]
  2. a[new Range(1, 3, 1)]
  3. new Range(1, 3, 1)[@@geti](a)
  4. a[@@slice](1, 3, 1)

To avoid ambiguities with conditional and labels, we could restrict ranges to element access (a[1:3]) and parenthesized expressions ((1:3)).

Also, if Range (or whatever name we choose) supports @@iterator, you could easily create arrays of ranges, or for..of over a range:

// create array
const ar = [...(1:5)]; // [1, 2, 3, 4]

// or, allow without parens in array
const ar = [...1:5]; // [1, 2, 3, 4]

for (const x of (0:10)) { // 0, 1, 2, ..., 9
}

caub commented 6 years ago

I don't expect [...1:4, ...6:10] cases to be frequently used, but it's indeed nicer than [...[1:4], ...[6:10]]

for (const x of (0:10)) doesn't simplify much for (const x of [0:10])

I see how this range literal is fitting well in this proposal, this looks great

rbuckton commented 6 years ago

for (const x of (0:10)) doesn't simplify much for (const x of [0:10])

Except that iterating over a Range would be far less memory intensive:

for (const x of (0:Number.MAX_SAFE_INTEGER)) {
  // only need to hold four numbers (start, end, increment, and current) and the Range object in memory
}

for (const x of [0:Number.MAX_SAFE_INTEGER]) {
  // need to hold an Array object with 9,007,199,254,740,991 numbers in memory!
}

caub commented 6 years ago

@rbuckton would the range literal expose methods like .map, .filter?

This would be interesting:

(1:8).map(x => x**2)
(0:5).map(i => (0:5).map(j => 5*i+j))

If not, it's still possible to spread it of course

[...1:8].map(x => x**2)
[...0:5].map(i => [...0:5].map(j => 5*i+j))

rbuckton commented 6 years ago

[...] would the range literal expose methods [...]

No, I wouldn't expect it to.

caub commented 6 years ago

@rbuckton new Slice(1, 3) // (1:3) could be a good name maybe for this new literal constructor (since Range's taken)

Should we make a PR for this, to sum it up?

rbuckton commented 6 years ago

Slice.prototype[@@splice] might seem a little strange though. What about Interval (https://en.wikipedia.org/wiki/Interval_(mathematics))?

ljharb commented 6 years ago

I'm confused, why would we want a splice symbol? splice is abomination.

rbuckton commented 6 years ago

It seems odd to have x = ar[1:3] without the inverse ar[1:3] = x.

ljharb commented 6 years ago

I find the former intuitively useful and the latter violently unpalatable; i don't see an advantage to syntax that creates a ton of observable operations and also represents what's become a very unidiomatic pattern (optional chaining has no plans to add optional assignment, for comparison).

hax commented 6 years ago

I feel x[range] could cause confusion. JS programmers always treat x[y] as a simple property lookup and I believe we'd better keep it simple. Instead of inventing new syntax let range = 1:3; let chunk = source[range]; I'd rather simply use let range = [1, 3]; let chunk = source.slice(...range);.

caub commented 6 years ago

A syntactical expression (foo[1:3]) is always better than an 'API'/dynamic one (foo.slice(1,3)). Just like [1, 2] would be better than Array(1, 2). Because it can throw if it's malformed, it can allow perf optimizations I guess, ...

The biggest benefit, for me at least, is the range creation discussed in this issue, because Array.from({length: ..}, (_, i) => ...) becomes common. For example, 13 occurrences of Array.from({ length in https://github.com/30-seconds/30-seconds-of-code. And it's awkward, error-prone, unpractical, verbose, simply a bad sign (https://github.com/graphql/graphql.github.io/pull/456#discussion_r199057305). So [...0:10] would be a great addition to the language

hax commented 6 years ago

Don't make me wrong. I think foo[1:3] form is an acceptable syntax sugar. But I think foo[range] is not a good idea just like current proposal do not allow foo[complexExperssion1:complexExpression2].

caub commented 6 years ago

@rbuckton what does the i mean in @@geti, @@seti?

Other thing, for @@splice:

const a=[]; a[2:6:2] = 4; // a will be [undefined,undefined,4,undefined,4] or still [] ?
const a=[1,1,1,1]; a[1:3] = [2, 4]; // would an array be 'spread'?
// so a would be [1,2,4,1]? or [1,[2,4],[2,4],1]

I guess the latter, so it could only assign a same value to a range of indexes

Concerning the Follow-on Proposals: 2.1 a[1:3] -> a[@@slice](1, 3, 1) a[1:3] = b -> a[@@splice](1, 3, b) I guess you mean a[@@splice](1, 3, 1, b) or it could maybe also accept a[@@splice]((1:3), b) or a[@@splice](new Range(1, 3, 1), b)

2.2 I find @@geti, @@seti redundant with 2.1, just by switching the Range and the target array, I don't think Range should have this responsibility, it should just be 'read-only' and iterable

Personally I'd drop them, (so 2.3. iii as well)

For the naming, Interval sounds too generic since it's a more particular integer interval here, Sequence could fit, but I think we should keep Range/range, and maybe have it attached to Array, new Array[Symbol.range](1, 8, 2) to avoid any conflict with DOM Range

Hope we can merge that to the proposal, I was trying to see how to implement a babel plugin for it

rbuckton commented 6 years ago

@caub

what does the i mean in @@geti, @@seti?

In this case, "inverted". Basically, the semantics of @@geti would invert the [[Get]] operation from obj[key] to key[@@geti](obj), giving key the ability to determine how to get the value from obj.

A good example for @@geti and @@seti would be WeakMaps:

WeakMap.prototype[@@geti] = function (target) { return this.get(target); }
WeakMap.prototype[@@seti] = function (target, value) { this.set(target, value); }

const weakPropertyX = new WeakMap();
const obj = {};
obj[weakPropertyX] = 1;
console.write(obj[weakPropertyX]); // prints 1

There are plenty of other use cases for @@geti/@@seti as well:

function pick(...names) {
  return { 
    [Symbol.geti]: (obj) => names.reduce((result, name) => (result[name] = obj[name], result), {}}
    [Symbol.seti]: (target, source) => { for (const name of names) target[name] = source[name]; }
  };
}

const obj = { a: 1, b: 2, c: 3 };

// pick properties to read from `obj`
const obj2 = obj[pick("a", "c")];
obj2; // { a: 1, c: 3 };

// pick properties to write to 'obj'
obj[pick("a", "b")] = { a: 4, b: 5 };
obj; // { a: 4, b: 5, c: 3 }

The @@geti/@@seti methods would be a convenient and consistent mechanism for all of these cases (including a Range).

rbuckton commented 6 years ago

@ljharb while I understand your concern about @@splice (and most languages that implement some kind of array slice notation don't support this either), I do wonder about the inconsistency of not having it:

a = b; // regular assignment
[a] = [b]; // destructuring assignment
a[0] = b[0]; // regular assignment
a[1:3] = b[1:3]; // not supported?

caub commented 6 years ago

so a[1:3] = 2 is invalid right? it has to be a[1:3] = [4, 4] for example

I guess a[1:3] = [4] would assign 4 to a[1] and undefined to a[2] or would it leave it the same?

and a[1:3] = a[1:3:-1] would switch items :)

It's another reason to not apply this slice-notation to strings, since setter/splice wouldn't make sense for them. But it would still be very interesting to have @@slice and @@geti for strings

rbuckton commented 6 years ago

@caub:

a[1:3] = 2 would probably be invalid because 2 is not an array or iterable (see below).
For x[a:b] = z, I had imagined the semantics would be something like x.splice(a, (b - a), ...z): The elements at x[a:b] are removed from x and the elements in z are inserted in their place. This is also why @@splice ignores the "step" argument, because all of those elements would be replaced.
Yes, I imagine that is how that would work given the above semantics.

rbuckton commented 6 years ago

Also, removing a section of the array could be something like a[5:10] = []

caub commented 6 years ago

but splice wouldn't support the step (in start:end:step)? I mean it gets very confusing:

a=[1,2,3,4,5,6]; a[0:4] = [7,8,9,10,11] would transform a in [7,8,9,10,11,5,6], just like Array.prototype.splice

but a=[1,2,3,4,5,6]; a[0:4:2] = [7,8,9,10,11] would transform a in [7,2,8,4,9,6,10,11]?

if we ever want to change an array, we can always do a = [...a[0:i], ...a[i+1:]] for example to remove ith item. Having only Array @@slice and Range @@geti could be simpler (and it'd work better with 'read-only' strings)

But I admit with slice only we can't do the second example (insert items every step), so I'm neutral for @@splice/@@seti

caub commented 6 years ago

Could it work in destructuring? like so:

const a  = [1,2,3,4,5];
const {[0:-1]: a1, [a.length-1]: last} = a;
// a1 == [1,2,3,4]
// last == 5 // this already works

rbuckton commented 5 years ago

C# 8 has added ranges and indexes, which includes both syntax and types for these behaviors:

The Index type represents a position relative to the start or end of an indexed collection. The ^n syntax is shorthand for new Index(n, true).
The Range type represents a start and end Index within an indexed collection. The x..y syntax is a shorthand for new Range(x, y).

caub commented 5 years ago

I'll start writing a babel plugin for it

caub commented 5 years ago

I did a polyfill with acorn: https://github.com/brigand/jellobot/pull/31/files#diff-a1284a77ff99b45ce588591eddda54a9, I'll try with babel later

rbuckton commented 5 years ago

In light of #30, I've been tinkering with what this might look like in ECMAScript: https://gist.github.com/rbuckton/174b02d2a43573627201f8057701044c:

Adds an Index built-in object that can be used to compute an index relative to the start or end of a collection.
Adds the ^n syntax as a shorthand for new Index(n, "end")
Adds an Interval built-in object that can be used to compute the start, end, and step for a collection.
Adds the (m:n) and o[m:n] syntax (as well as (m:n:s) and o[m:n:s] for a custom stepping value).
Adds a @@geti symbol for an "inverted-get": a[b] --> b[@@geti](a)
Adds a @@seti symbol for an "inverted-set": a[b] = c --> b[@@seti](a, c)
Adds a @@indexedGet symbol used to define a method to get a value based on an Index.
Adds a @@indexedSet symbol used to define a method to set a value based on an Index.
Adds a @@slice symbol used to define a method to get values based on an Interval.
Adds a @@index symbol used to define the method on an Index used to calculate the actual index based on a provided length.
Adds a @@interval symbol used to define the method on an Interval used to calculate the actual start/end/step based on a provided length.

The @@index and @@interval symbols provide a mechanism to calculate an actual index or interval based on a provided length. This would allow us to define an arbitrary endpoint like ^1 to mean "one from the end" when the "end" is not yet known.

The @@indexedGet, @@indexedSet, and @@slice symbols provide an extensibility mechanism for users to implement custom collection classes and control how to determine the length to pass to an Index or Interval.

Index Example:

let ar = ["a", "b", "c", "d"];
let m1 = ^1;
         // --> new Index(1, "end");

ar[m1]; // "d"
// --> m1[Symbol.geti](ar)
// --> ar[Symbol.indexedGet](m1)
// --> ar[m1[Symbol.index](ar.length)]
// --> ar[ar.length - 1]
// --> ar[4 - 1]
// --> ar[3]
// --> "d"

Interval Example:

let ar = ["a", "b", "c", "d"];
let r = (0:^1);
        // --> new Interval(new Index(0, "start"), new Index(1, "end"))

ar[r]; // ["a", "b", "c"]
// --> r[Symbol.geti](ar)
// --> ar[Symbol.slice](r)
// --> Slice of `ar` for `r[Symbol.interval](ar.length)` as ([start, end, step])
// --> Slice of `ar` for `[r.start[Symbol.index](ar.length), 
//                         r.end[Symbol.index](ar.length), 
//                         r.step]` as ([start, end, step])
// --> Slice of `ar` for `[0, ar.length - 1, 1]` as ([start, end, step])
// --> Slice of `ar` for `[0, 4 - 1, 1]` as ([start, end, step])
// --> Slice of `ar` for `[0, 3, 1]` as ([start, end, step])
// --> ["a", "b", "c"]

Host engines like V8 could choose to optimize code paths during compilation to remove the reification of Index and Interval types at runtime.

(edit: switched from Range to Interval)

caub commented 5 years ago

What's the advantage of 0:^n over 0:-n or 0: (undefined endIndex to represent ^0)?

Is your idea to completely avoid this notation for strings, since assignment expressions wouldn't make sense for them (we can also add the arguments of https://github.com/tc39/proposal-slice-notation#should-we-ban-slice-notation-on-strings)?

I feel like it'd be good to avoid introducing new built-in objects (to reduce the "cost" and complexity for this proposal), I thought about a Slice at some point, but it's possible to handle the range syntax, and range expression by an engine, without any additional built-in. Like ArrowFunctionExpression for example, there isn't any constructor, or like many other operators.

I implemented slice-notation/slice-expression in https://github.com/engine262/engine262/pull/89/files#diff-7a3164ab8de945e8bd82f29aa3f3b300R10-R27 It should actually be this (using Symbol.slice #1):

  if (expression.type === 'SliceExpression') {
    let start, end, step;
    if (expression.startIndex){
      const startPropertyRef = yield* Evaluate(expression.startIndex);
      start = Q(GetValue(startPropertyRef));
    }
    if (expression.endIndex) {
      const endPropertyRef = yield* Evaluate(expression.endIndex);
      end = Q(GetValue(endPropertyRef));
    }
    if (expression.step) {
      const stepPropertyRef = yield* Evaluate(expression.step);
      step = Q(GetValue(stepPropertyRef));
    }

    const bv = Q(RequireObjectCoercible(baseValue));
    const slice = Q(GetMethod(bv, wellKnownSymbols.slice));
    // #sec-call
    return Call(slice, Value.undefined, [start, end, step]);
  }

It's slightly limited compared to a Slice built-in object or an Interval built-in like you propose only for something like:

arr[(() => (0:2))()] // would not be like arr[0:2]
// it'd be like arr[ToString((() => (0:2))())] rather

Because we don't evaluate/resolve the SliceExpression like we could with a built-in object, but I don't think it's an issue, this feature is intended to be used 'statically'

rbuckton commented 5 years ago

One motivator for ^1 over -1 is that ar[-1] already has a meaning in ECMAScript, while ar[^1] does not. You could also conceivably use it with other APIs (i.e. text.indexOf("a", ^3)).

caub commented 5 years ago

Well true, I don't it's possible to extend .indexOf to handle a negative startIndex because of backward-compatibility

There's https://github.com/keithamus/proposal-array-last proposing an arr.lasItem arr.lastIndex, but that's not really practical

I like this ^n idea, and I also think we could avoid a built-in Index object for it, and similarly to what I did, only have syntax for it, and evalutate it in context (only MemberExpression, elsewhere it doesn't really make sense)

EDIT: it seems .indexOf already work with negative indexes:

[...'banana'].indexOf('a', -3)
// 3

but only for Array.prototype.indexOf

'banana'.indexOf('a', -3)
// 1

~~but there are String.prototype.lastIndexOf, Array.prototype.lastIndexOf for those cases~~

rbuckton commented 5 years ago

Is your idea to completely avoid this notation for strings, since assignment expressions wouldn't make sense for them [...]

Given the feedback in this thread, @@splice seems to be off the table for now. The upside of the approach I outlined WRT strings is that a String could control how a relative "end" is applied:

If slice is based on code units, then we would define @@slice to call @@interval on the supplied interval with length.
If slice is based on code points, then we would instead define @@slice to call @@interval on the supplied interval with the number of code points in the string.
- Alternatively, we could also define a CodePointInterval and CodePointIndex @@codePointInterval and @@codePointIndex symbol methods to give you fine grained control over the behavior:
```
text[0:^1] // slice via code units
text[new CodePointInterval(0, ^1)] // slice via code points
```
// helper to convert code unit to code point for index/interval: const cp = { [Symbol.index]: index => new CodePointIndex(index.value, index.end), [Symbol.interval]: ival => new CodePointInterval(ival.start, ival.end, ival.step) };

text[cp[^1]] // last code point text[cp[0:^1]] // string except last code point

caub commented 5 years ago

Yes, I agree, and I'd still prefer to handle those cases without additional built-ins (or at least less additional built-ins) We can also think of BigInts: 0n:2n:1n and they'd work in my implementation, without defining new built-ins (only Symbol.slice actually, and possibly also Symbol.index if we go for ^n syntax)

The specific behavior for String you described, will be inlined in String.prototype[Symbol.slice], and it'd be overridable if needed (extending String, I don't know if it's a good idea though)

hax commented 4 years ago

@rbuckton Any update?

It seems there are too many things we want to add, maybe we can minimize them and write a separate proposal? For example, we can first specify

reverse index syntax (^1)
index range (interval) syntax (0:^1)
x[^1] syntax and semantic for Array, TypedArray and String
x[0:^1] syntax and semantic for Array, TypedArray and String

and leave all other things like ^a syntax, Index, a:^b syntax, IndexRange (Interval) and symbols to follow-on proposals.

caub commented 4 years ago

I think the idea was to desugar the reverse index syntax and the index range syntax to Index and IndexRange so they would come together

tc39 / proposal-slice-notation

Slice Extensibility #19