mustache / spec

The Mustache spec.
MIT License
361 stars 71 forks source link

Proposal: Accessing nth element of an array #109

Closed alanyee closed 8 months ago

alanyee commented 5 years ago

Only some implementations of mustache have the feature of accessing the nth element of an array. Such a feature is useful, and it should be included in the spec so that all up-to-date implementations have this feature.

chrisdotcode commented 1 year ago

I agree. I believe that this behavior is worth explicitly stipulating as required functionality by the spec. ~As of now, not only array access, but nested object access is not mentioned anywhere in the spec. As such, foo.bar is currently undefined behavior, but many specs implicitly support it already.~ (This is untrue, cf. https://github.com/mustache/spec/issues/138#issuecomment-1237324389. In reality, the documentation on https://mustache.github.io/mustache.5.html just doesn't mention this, so I thought it was implementation-specific behavior. Perhaps I can make a PR to update the doc on that page to explicitly mention dot-access?)

In JavaScript, since accessing an array by an index is roughly the same as accessing an object by key (because Arrays in JavaScript are just Objects), we get foo.0 "for free". But that seems to be a happy implementation-specific coincidence? Can we formally specify this behavior so that we can rely on it across implementations? (There might need to be two different syntaxes, though: one for objects (e.g.: foo.bar), and one for arrays (e.g.: foo[0]), as I'm sure many already-existing parsers might not like variable names that start with a number (e.g.: foo.0).

I suspect @jgonggrijp could say that a power lambda implementation is possible, like in https://github.com/mustache/spec/issues/138#issuecomment-1286225095, but accessing array items by index is SO common, and already supported by implementations, so I'm hoping we can just formally specify already-existing behavior at this point?

jgonggrijp commented 1 year ago

I agree. I believe that this behavior is worth explicitly stipulating as required functionality by the spec.

Do keep in mind that adding required functionality to the spec implies issuing a major version upgrade. This, in turn, implies a lot of coordination and a long delay.

(...) the documentation on https://mustache.github.io/mustache.5.html just doesn't mention [dotted names], so I thought it was implementation-specific behavior. Perhaps I can make a PR to update the doc on that page to explicitly mention dot-access?)

I already went that route, but the update got stuck on the way to actual publication (see https://github.com/mustache/mustache/pull/266 and https://github.com/mustache/mustache.github.com/pull/149). In the meanwhile, I'm hosting an up-to-date version of the mustache(5) manpage over here, which does include dotted names. Feel free to tell other people about it. If the need arises, you can submit a pull request to https://github.com/jgonggrijp/mustache.

In JavaScript, since accessing an array by an index is roughly the same as accessing an object by key (because Arrays in JavaScript are just Objects), we get foo.0 "for free". But that seems to be a happy implementation-specific coincidence?

Yes, that's exactly what it is.

Can we formally specify this behavior so that we can rely on it across implementations? (There might need to be two different syntaxes, though: one for objects (e.g.: foo.bar), and one for arrays (e.g.: foo[0]), as I'm sure many already-existing parsers might not like variable names that start with a number (e.g.: foo.0).

It's complicated. Some programming languages start at 1 instead of 0, for example R. Some implementations will trip over dotted name parts starting with digits, as you already suggested. Some programming languages already use square brackets for different syntactic functions (for example Mathematica, which uses them for function calls) and/or have a different native syntax for array subscripts.

I'm not saying these complications couldn't be overcome or that specificying array subscription is necessarily a bad idea. I'm just saying that it isn't as straightforward as it may seem.

I suspect @jgonggrijp could say that a power lambda implementation is possible, like in #138 (comment),

Technically, this is true; I do think such a feature could be implemented using power lambdas and I might even say that. However, I don't really believe this would be a good application for power lambdas.

but accessing array items by index is SO common,

Is it really? I think I have done {{#someList.0}} a couple of times in order to check for list nonemptiness, but this use case could be addressed in better ways without array subscripting (see #147). Other than that, I don't think I ever use array subscripts and I don't remember seeing them in other people's templates, either. Could you suggest a usecase?

and already supported by implementations, so I'm hoping we can just formally specify already-existing behavior at this point?

As stated before, while several implementations do support such a notation, it is far from universal. Formally specifying it won't be simple.

chrisdotcode commented 1 year ago

(Thanks for letting me know about the new documentation location. Will spread it.)

My train of thought here, is that formally specified behavior is better than implementation-specific or undefined behavior. So if even the spec said something like, "Implementations MAY interpret integer dotted names as array indices", that would be sufficient and something we could get in a minor semver bump (and then maybe talk about requiring this behavior later). In my own personal experience, since mustache.js supports this (again, by happy coincidence), I thought that it was supported in all implementations, but was in for quite the surprise when that was not the case. Since the spec current remains silent, the user is unsure as to whether they can rely on indexing across implementations. Explicit is better than implicit, right?

It's complicated. Some programming languages start at 1 instead of 0, for example R. Some implementations will trip over dotted name parts starting with digits, as you already suggested. Some programming languages already use square brackets for different syntactic functions (for example Mathematica, which uses them for function calls) and/or have a different native syntax for array subscripts

I totally get implementation complexity, but I don't think mustache should be "held back" by implementations, per se. For example, python doesn't support dotted syntax for regular Python objects (rather foo["bar"]), but its up to implementers to conform to mustache's spec, not the mustache spec to conform to various languages. Nevertheless, I do see your point, because although mustache shouldn't be limited by implementations, it should certainly be "informed" by them, and try to reduce implementation load where possible. That being said, I think indexing notation, like foo[0], if supported, is common enough in programming to be understood as sub-scription, regardless of the language (just like dereferencing for dynamic names).

I don't think I ever use array subscripts and I don't remember seeing them in other people's templates, either. Could you suggest a usecase?

Perhaps you're right. Maybe it's not as common as I'm thinking it is. However, with index access, you can alleviate some conditional logic in the middle of an array. For example, if you know foo[5] has some special value or is falsy, you can switch on foo[5] specifically. I don't think this is possible currently otherwise (without tons of if/else logic, or lots of superfluous lambdas).

What I think is a stronger argument for allowing this is the Zero One Infinity rule: we already allow indexing by key, why arbitrarily limit what else can be indexed? Index all of the things - as some implementations already allow you do to - but as I mentioned at the start, formal specification is better than implementation-specific (or undefined) behavior.

I know, if approved, it will take some time before a 2.0, which is why I think the discussion is worth visiting now, so it can be hashed out if/when that milestone.

jgonggrijp commented 1 year ago

In my own personal experience, since mustache.js supports this (again, by happy coincidence), I thought that it was supported in all implementations, but was in for quite the surprise when that was not the case.

mustache.js is confusing a lot of people, because it seems official while it is actually rather nonstandard. While it has been around for a long time, it was never a reference implementation (that honor goes to the original Ruby Mustache), it moved in a different direction than the spec and the author does not intend to bring mustache.js closer to the spec again. People wrote alternative Mustache implementations for JavaScript for this reason (Twitter employees wrote Hogan, now end-of-life, and I wrote Wontache).

In this light, it is also rather unfortunate that the (outdated) Mustache website still uses mustache.js for the demo feature. You can use the Wontache playground instead; it supports lambdas and partials and it follows the spec 100%.

Since the spec current remains silent, the user is unsure as to whether they can rely on indexing across implementations. Explicit is better than implicit, right?

True. In that case, the text you suggested above should probably explicitly end with "... but you cannot rely on it".

I don't think I ever use array subscripts and I don't remember seeing them in other people's templates, either. Could you suggest a usecase?

Perhaps you're right. Maybe it's not as common as I'm thinking it is. However, with index access, you can alleviate some conditional logic in the middle of an array. For example, if you know foo[5] has some special value or is falsy, you can switch on foo[5] specifically. I don't think this is possible currently otherwise (without tons of if/else logic, or lots of superfluous lambdas).

I understand the technical implications, but my question remains: why would anyone actually want to do this? What is so special about this individual element of a list, and if it is so special, why is it in a list buried between other, seemingly equivalent things in the first place? I would like to see a concrete example where an element is extracted somewhere from the middle of a list and there isn't a better solution.

What I think is a stronger argument for allowing this is the Zero One Infinity rule: we already allow indexing by key, why arbitrarily limit what else can be indexed?

Sorry to nitpick, but if I understand that rule correctly, supporting indexing only by keys is perfectly fine: it is the "one" case.

Index all of the things - as some implementations already allow you do to -

Which implementations do this, according to you? I know that the JS implementations allow numeric keys, but that's because arrays in JS are actually just objects with property names containing only digits. None of the JS implementations allows accessing Set or Map entries, as far as I know.

but as I mentioned at the start, formal specification is better than implementation-specific (or undefined) behavior.

This is not a universal truth. Leaving something undefined or implementation-defined is often a conscious, deliberate choice.

chrisdotcode commented 1 year ago

mustache.js is confusing a lot of people...

That's for sure. Thanks for clearing that up. My primary reason for advocating for this was because I did indeed think mustache.js was the spec reference implementation. As such, my PoV was "since this behavior is already in the "official" implementation, why isn't it in the spec??"

I understand the technical implications, but my question remains: why would anyone actually want to do this?

To be honest, I can't think of any use cases right this second (maybe @alanyee can), although I'm glad you necessarily aren't strictly opposed to adding it at some point. I know that I have needed this in the past.

Sorry to nitpick, but if I understand that rule correctly, supporting indexing only by keys is perfectly fine: it is the "one" case.

Leaving something undefined or implementation-defined is often a conscious, deliberate choice.

You're right about that first part. As per my first paragraph in this comment, I think I have a "problem" more with the undefined, unspecified, half-supported behavior than whether or not this is actually permitted in the spec. It would be nice to have the behavior explicitly quantified, like with a "Implementations MAY..." statement. As of right now, I'd argue the silence of the spec is worse than implementation-defined behavior: in C, for example, the spec will say "such and such is undefined, or implementation-specific". But we've got nothing.

As you've noted, thanks to mustache.js' current supremacy (and the corresponding outdated doc), users can be fooled into thinking that whatever mustache.js does is The Blessed Way(TM), and all differing implementations are therefore non-compliant, hence my request for specification. That, in conjunction with the spec's silence lulls you into a false sense of security, as we both have observed.

So, at minimum, even if this doesn't get implemented, I feel like a "Implementations MAY interpret integer dotted names as array indices" statement could be very helpful for clearing up confusion (and as per https://datatracker.ietf.org/doc/html/rfc2119#section-5, we wouldn't necessarily need to state that the behavior isn't to be relied upon, since that's what "MAY" is defined as).

jgonggrijp commented 1 year ago

Alright, feel free to submit a pull request. Let's see whether we can reach a consensus on how to formulate this.

jgonggrijp commented 9 months ago

Update to the official website being outdated: it is much less outdated now! The mustache(5) manpage is now current with the latest version of the spec.