tc39 / proposal-iterator.range

A proposal for ECMAScript to add a built-in Iterator.range()
https://tc39.es/proposal-iterator.range/
MIT License
487 stars 13 forks source link

Normative: Switch to class + iterable semantics. #42

Closed Jack-Works closed 1 year ago

Jack-Works commented 4 years ago

close #41, close #22, close #17

I used the approach I have described in 3.1.b (non-global non-constructible class + creator function) of #41.

The biggest change to the developers is that they have to add a .values() before they use the range with Iterator Helpers.

- Number.range(0, 5).take(2)
+ Number.range(0, 5).values().take(2)

Please choose "hide white space changes" when viewing the diff, which will make the review easier.

Other changes:

devsnek commented 4 years ago

should be new Number.Range() to avoid confusion.

Jack-Works commented 4 years ago

should be new Number.Range() to avoid confusion.

Please read #41 for explanation

devsnek commented 4 years ago

You could also do NumberRange.from() or Number.Range.from or something like that as well. Not following existing patterns does nothing except make it harder for people to learn the language.

Jack-Works commented 4 years ago

Now the underlying class is "NumericRange" and works for both number and bigint. It requires a "type" to indicates it is a number range or bigint range. That's what be strange to have Number.Range but actually points to NumericRange. (Or maybe subclassing it to provide the first parameter implicitly?)

Another approach is your said, but not a "from", it needs to distinguish number and bigint so maybe it needs each method for each number type.

devsnek commented 4 years ago

you have to pass numeric values for the start and end, seems pretty unambiguous to me.

Jack-Works commented 4 years ago

you have to pass numeric values for the start and end, seems pretty unambiguous to me.

Do you mean to infer the type of the NumericRange from the start? (end can be infinity in bigint).

devsnek commented 4 years ago

that's one option, yes

domenic commented 4 years ago

This is really disappointing. Many people made strong comments in https://github.com/tc39/proposal-Number.range/issues/17 explaining how the idiomatic pattern for the language is to return an iterator. I'm sad to see this change and think that in practice this makes the proposal much less useful and usable. In fact, I'd say that this change would make the proposal not worth including in the language, since it causes the complexity-vs.-benefit tradeoff to become too high. (Introducing new or deprecated notions, like hidden/non-constructible/new-less classes, is a big part of this.)

I'd urge you to reconsider, and keep the existing semantics, per the cogent arguments from @devsnek, @tabatkins, @ljharb, and others in #17.

hax commented 4 years ago

@domenic I think there is a disagreement that whether the idiomatic pattern for the language is to return an iterator. And as the research @Jack-Works did almost all programming languages choose expose iterable-like semantic instead of iterator-like semantic for range-like API. We need a very strong reason to deviate from that.

devsnek commented 4 years ago

those languages don't use js's iterator patterns, it seems undesirable to draw a comparison with them.

hax commented 4 years ago

@devsnek What languages? What's the difference??

devsnek commented 4 years ago

@hax the table of languages you linked to in your comment above. this is something we went over a lot in the iterator helper proposal, with people wanting it to be for iterables instead. There are lots of closed issues on that repo (for example issue 8) with info on this.

hax commented 4 years ago

I don't see there are any essential concept difference between js iterator vs for example python iterator.

devsnek commented 4 years ago

I'm not an expert on python's iteration design, but I do know that range() in python is a class, not a function (what i was suggesting above).

hax commented 4 years ago

Python do not need new keyword, so it could have same surface syntax like global function. Personally I'm fine with new Number.Range(). I think @Jack-Works try to minimize the userland code, so he try to make Number.range() as a factory method which returns a Number.Range instance.

Anyway , the key point here is range-like things are iterable-like not iterator-like in most other languages, and I don't think there is any essential differences of iterable/iterator concept between those languages and JS. If anyone think there are, pls explain.

devsnek commented 4 years ago

If anyone think there are, pls explain.

I'm not going to rewrite this here, but I encourage you to look through the closed issues on https://github.com/tc39/proposal-iterator-helpers because this stuff is discussed in depth.

domenic commented 4 years ago

Yes. I think this difference is really well established in the language now, and was also covered in #17.

hax commented 4 years ago

To be clear, I'm not arguing whether iterator helpers should be iterable helpers (I'm ok that iterator helpers use iterators), I'm asking for what reason range should be iterator, especially range-like thing in most other languages are iterable not iterator.

domenic commented 4 years ago

I keep referring you to #17, but maybe it would help to refer you to some specific comments, which explain why JS is different from other languages. Some of my favorites are the first paragraph of https://github.com/tc39/proposal-Number.range/issues/17#issuecomment-604110515, the second paragraph of https://github.com/tc39/proposal-Number.range/issues/17#issuecomment-642440597, the explanation that (unlike other langauges) generators return iterators in https://github.com/tc39/proposal-Number.range/issues/17#issuecomment-643482131, and more.

Jack-Works commented 4 years ago

It seems like we ended in an endless argue in two sides 😥

Sometimes I think Oh why not have an auto-clone @@iterator or make it return a higher order function with @@iterator on it. If we're in the user land, it really a creative way to resolve this problem. But we cannot have it in the language... It's extremely hard to choose one route cause both routes seems reasonable to me and both routes have their own fatal problems.

Is questionnaire or research deus ex machina? I don't know. Never mind, just some nonsense.

Jack-Works commented 3 years ago

I'd like to ping the research call members maybe @codehag? Do we have any on-going research/questionnaire about which route is less error-prone? (though I believe the iterator one is error-prone, I cannot prove it) 👀

codehag commented 3 years ago

Sure! Let's unblock this, one way or the other. I am a little slow to respond, sorry if i missed previous pings. Next one is in exactly a month. Does the time on the calendar work for you or should we shift?

Jack-Works commented 3 years ago

Is there any public survey for normal developers has been published? I think that will help us to figure out how developers thought on this topic.

codehag commented 3 years ago

We've done a few surveys, and there are a number precedents, but the tool should fit the question, and the question needs to be well scoped. If, for example, you want to test error proneness of the two approaches (doing something similar to A/B testing), we could design something around that. On the other hand, if you want to see which facets of a given design stand out, you can do something like Felienne's research on block based languages: https://www.researchgate.net/publication/336187404_XLBlocks_a_Block-based_Formula_Editor_for_Spreadsheet_Formulas -- this uses the CDN framework which has a well defined survey for running such studies.

So, those are two starting points. But what we really need is a good understanding of the disagreement here. I only have a high level understanding so far (it looks like language consistency and ergonomics issues, with ergonomics being broadly defined?), I would need to dig a bit deeper to be of any help there, so that we know what we are actually investigating. We want to be careful not to encode our biases into a study.

For example, if we take the assertion that "error-proneness concern is coming from the refactoring hazard" made at the beginning of the thread in (i think) #17: That is something we can check, but we have to also verify that the example is realistic -- a survey run with an encoded bias will return for you the bias that you want. It is easy to make the numbers say what you want, but this would be poor methodology. A way to avoid this, is when designing the study - try to disprove your hypothesis. Another tool for building something like this is "what would convince me otherwise".

codehag commented 3 years ago

Having read some of the issues in a bit more depth, I am not sure that this will be addressed by the two studies kinds I listed above. One part of that is because it looks like the study, if applied to this draft, would be testing something that is outside of the categories that would be accepted by the committee. So it doesn't make sense to do that. The other issue is that it looks like the use-case for reusable number.range hasn't been proved, it was just suggested as part of the design space. Testing alternatives without proving the use case might be getting ahead of ourselves. Determining the importance of it may help resolve this issue. Would it be right to say that that is the determining question?

Let's discuss this in depth in the research call so we know what we are really trying to answer.

Jack-Works commented 3 years ago

A personal recap with my preferring.

  1. Egnormics in normal use cases (No problem)

This is no longer a problem for any route we have. For iterators Number.range(a, b) is good. For iterables we can make Number.range(...) a shortcut to new NumberRange(...) so they are exactly the same.

  1. Egnormics with Iterator helper (No problem)

This is no longer a problem for any route we have. Iterators work with iterator helper naturally so no problem. Though range[Symbol.iterator]() is a big problem, hax (IIRC) suggests to add .values() (like Array#values) so there is no problem either. .values() are very easy to write (w/ or wo/ the editor with LSP support).

  1. Refactoring hazard (!!!)

For iterables there is no problem.

Developer might want to refactor this code because there is duplication.

for (const a of Number.range(0, 10)) {
    // 0 to 10
}
for (const a of Number.range(0, 10)) {
    // 0 to 10
}

But once you did that, the program brokes.

const range = Number.range(0, 10)
for (const a of range) {
    // 0 to 10
}
for (const a of range) {
    // No! The range has already consumed! Nothing yielded.
}

The correct way to do that is:

const range = () => Number.range(0, 10)
for (const a of range()) {
    // 0 to 10
}
for (const a of range()) {
    // Got 0 to 10 again
}
  1. Extensibility

We need to consider how to add new utils on the return value (iterator or a class instance) in the future.

=============

This is what problems we have I can recall from the previous discussions and with my personal ideas (and as you can see I'm preferring iterables currently due to reasons 3 and 4).

I think the key point is to investigate how will people use the range and if they will go into the refactor trap I mentioned before. If so, I think the iterator way is error-prone.

ljharb commented 3 years ago

In general, that is a not a safe refactor to make, and it would be a bad idea imo to encourage people to falsely think it’s a safe refactor. Iterators simply are not reusable generically, even if some are.

domenic commented 3 years ago

Exactly. If you replaced Number.range(0, 10) in your example with someMap.values(), it would also not be a safe refactor. Making Number ranges reusable would hurt people's intuitions for the rest of the language.

Jack-Works commented 3 years ago

So the problem is that we need to make sure if devs will do this in wild, that is what we need to investigate. I'm not encouraging devs to do refactor like this but they may do it.

hax commented 3 years ago

@domenic

If you replaced Number.range(0, 10) in your example with someMap.values()

As I explained many times, people normally don't need to factor out someMap.values() because someMap already be a reusable point. On the other side, people likely to factor out Number.range(from, start) to a variable, because there is no reusable point, u need to repeat yourself every time.

The simple test is, if someone don't want repeat code, how will they do?

Do they write let r = Number.range(0, 10) or let r = () => Number.range(0, 10)? My experiences tell the the former, especially this is how things work in many other languages.

codehag commented 3 years ago

I would like to have a better understanding of the usecase -- this is the research that I was proposing to @Jack-Works -- it's a loose corpus study -- in case you want to get an idea of how that works. I used python in the context of mozilla to validate it as the language has had this feature for a long time.

The example of Mozilla's python code is not representative, but as it reuses a number of python libraries and is easily searchable, nor is this exhaustive, I just did this in a few hours -- I think it gives us a decent idea of what we are working with.

Some on the fly vocabulary: Bare assignment -- Ranges that have not been wrapped or cast to another type. Cast -- Ranges that have been wrapped or cast into another type. Throw-away -- Ranges that are not assigned, but re-initialized each time even when the same value might be used.

In a review of searchfox, I found that the usage of range as a throw away value exceeded the use of "bare range" being assigned to a value significantly. Total usage of range was appx. 1500. There were 112 bare range assignments, which I used as a good indicator of potential problematic reuse. This is compared to throw-away usage which was 1197, and finally there is the category of cast range, which accounts for ~170 cases. Bare range assignment accounts for about 7%. Total assignment of range for potential reuse, cast or bare, numbers about 300, or appx. 11%. I came across cases where a range was passed as an argument, but for most of these it was single use, or as a cast and I didn't investigate deeply. This is prior to verification of true reuse.

I categorized different kinds of bare usage, as well as true reuse according to frequency. In this segment I focus exclusively on Bare range assignement that does not qualify as true reuse can be categorized as follows:

True reuse occurs in the following forms:

Vague:

I would like to raise one specific instance I found of bare range reuse. The second use case of true bare reuse, involves code where a list of columns is replaced by a range of columns, and then that range or list is cast back into a list. This appears to be a case where the language was too flexible, and the reusability of range actually poses a risk of error proneness due to a conflation of types on the same variable. This is something that we might avoid by not making range reusable.

To summarize -- Reuse is rare in the case of ranges, and it appears that the refactoring hazard case is unrealistic (raised just before by @hax). Designing for this use case at the cost of language consistency is therefore not advisable. Additionally, the reusability of range may be error prone as shown by the conflation of types on the same variable. Finally, it highlights that we are spending time discussing a use-case that happens very infrequently, having been found in it's potential hazard form < 10 times in a codebase with a sample of ~1500 uses of range in python.

I would like to focus on one of the findings was that range would sometimes be reused in the following form:

list(range(10))
enumerate(range(10))
iter(range(10))
etc.

This form of cast reuse, as noted above, significantly outnumbered true bare range reuse. This is very interesting, and perhaps my most important finding so far. In a language that provides this capability of bare range reuse, it is largely not used.

I would emphasize that–given that reuse is rare, when it is reused it is usually cast to another type, and that the most common form is as a throw away value–designing for that use-case at the expense of language consistency is a poor language design choice. We would be prioritizing a very rare use over the learnability of the language.

However, we can still make the uncommon case meaningful. In fact, the JavaScript language already has idioms that support this. The best format for this has already been suggested, in my opinion:

let x = Iterator.from(Number.range(0, 10));
let y = Array.from(Number.range(0, 10)); // probably rare, but possible

Works very nicely for reuse and addresses most of the use cases. It reflects the reuse of range in a meaningful way -- we had a "range", and it can be an iterator, or an array -- whatever the use-case for the developer. For example if they needed an array specifically, or were simply more comfortable with it. Number.range would function equally in both cases, while maintaining itself as effectively a throw away value (the common case). In fact, this form would more closely mirror what python programmers tend to do than what is being proposed here, which is shoehorn range into being an iterator with an irregular syntax. There is much to be lost in trying to force JavaScript to be something it is not.

This is verbose, but terse code isn't better code -- it is often more confusing and prone to errors, and does not do self-teaching to the programmer. By using Iterator.from, we can communicate what is being built while maintaining existing categorizations of behaviour in the language -- the only difference is that it is longer. There is no need to invent something new for this proposal. It undermines an otherwise good proposal.

My observation here is that we are stuck not on usability as usability does not exist without a use case, but on aesthetics of what now appears to be an uncommon case. Aesthetics, which very important, is not the same as usability and has less impact without it. I believe we will be more productive if we let this go. We are talking in circles about a hypothetical situation that is rare even in languages that had this built in from the start.

There may be reason to continue discussing ergonomics and communication of intent, but it is not this refactoring hazard. I would like it to pivot it towards readability and communication, away from this semantics discussion. I believe we should think more creatively of how to communicate to programmers rather than privilege the rare case over the common one. It is an exercise we will likely need to do anyway: think about how we are naming this. range is a noun here. We could change to a verb such as count or iterate. This would be a more productive direction than going in circles. By using a verb, the consumption of the value will become more self-evident.

Edit: Not an expert here, so replaced iterable with throw-away to be more specific.

Jack-Works commented 3 years ago

Thanks, @codehag, this is very useful information to know, I guess the refactoring hazard is not a problem for iterator way now.

range used across a language boundary actually surprise me 😜

I need to mention that Python actually using the iterable semantics so reuse is actually no bugs.

x = range(0, 5)
x.__iter__()
>>> <range_iterator object at 0x0000021487858A10>
x.__class__
>>> <class 'range'>

I'd like to do a temp check, what semantics are you support (in a general design pattern, not the design details), exactly? Just in case to prevent we already reached an agreement but we don't know.

Jack-Works commented 3 years ago

It's ok for me and hax if we rename this to Iterator.range (Iterator.numericRange) to make it super clear about it is an iterator

Andrew-Cottrell commented 3 years ago

It's ok for me and hax if we rename this to Iterator.range (Iterator.numericRange) to make it super clear about it is an iterator

They could also be renamed to Number.rangeIterator and BigInt.rangeIterator for the same clarity; and this has an obvious continuation to Date.rangeIterator, etc. if considered useful in a future proposal.


I think I may have misunderstood. You were asking about renaming this proposal's name rather than any spec'd API?

Alhadis commented 2 years ago

What's the rationale for returning an iterator instead of a plain Array? Is it performance? Or is all this just so users can return an infinite range? Because the latter seems like an edge-case to me: I honestly can't foresee any realistic scenarios where one would need an infinite range that isn't part of a generator. Consider the example in the readme:

README.md example:Simplified equivalent:
~~~js function* even(){ for(const i of Number.range(0, Infinity)) if(i % 2 === 0) yield i; } ~~~ ~~~js function* even(){ for(let i = 0; ; ++i) if(i % 2 === 0) yield i; } ~~~

Conversely, returning a pre-populated list of values is pretty much what I'd expect a method named .range() to return:

const digits = Number.range(0, 9);
1 === digits[1];
9 === digits.pop();

It's also objectively more ergonomic, given the majority of users will probably need to convert the result to an array anyway:

const uppercase = Number.range(65, 90).map(n => String.fromCharCode(n));
const lowercase = Number.range(97, 122).map(n => String.fromCharCode(n));
// Versus:
const uppercase = Array.from(Number.range(65, 90)).map(n => String.fromCharCode(n));
const lowercase = Array.from(Number.range(97, 122)).map(n => String.fromCharCode(n));
devsnek commented 2 years ago

you should open a new issue

Alhadis commented 2 years ago

@devsnek Done. See #53.