Digit separators - Githubissues

jfbastien commented 6 years ago

Following a twitter discussion, I think digit separators should consider the C++ history and not necessarily repeat it because it was contended:

wg21.link/N3499
wg21.link/N3661 originally the digit separator was _
wg21.link/N3781 this paper changes it to '

I wasn't at these meetings, but will ping someone who was for context on the associated complexity.

littledan commented 6 years ago

Cc @samuelgoto @rwaldron

littledan commented 6 years ago

This bug might be better filed under https://github.com/tc39/proposal-numeric-separator

rwaldron commented 6 years ago

@littledan

This bug might be better filed under

Yes, I suppose that would be more appropriate.

According to http://wg21.link/n3448, the reason for the change was to avoid conflict with user-defined literals:

(Example from n3448)

Consider:

0xdead_beef_db

Is "_db" a suffix indicating a user-defined literal or two additional hexadecimal digits? What about "_beef_db"?

...But this is not an issue in JavaScript today, and I'm not sure why this proposal intentionally created a conflict by using _, but I'd suggest that JavaScript do the exact opposite of C++: use some other character or characters for ExtendedNumericLiteral, and keep _ for NumericLiteralSeparator (so that JavaScript can avoid being an outlier among peer languages that have _).

Here are some suggestions:

10(px)
10[px]
10:px
10'px

domenic commented 6 years ago

None of those suggestions work for the current shape of the proposal, which requires the characters to be valid in identifiers. I guess that leaves us with 10$px.

rwaldron commented 6 years ago

None of those suggestions work for the current shape of the proposal

That's correct, which is why I suggested that the proposal be reshaped.

requires the characters to be valid in identifiers

That's an arbitrary requirement.

This can still work exactly as expected:

let { px } = CSS;

document.querySelector("#foo").style.fontSize = 3:px;
document.querySelector("#foo").style.fontSize = 3'px;

If the px is a valid identifier, then the : or ' indicate "Here's where the NumericLiteral ends and the IdentifierPart begins". The grammar:

ExtendedNumericLiteral ::
  NumericLiteral `:` IdentifierPart

or

ExtendedNumericLiteral ::
  NumericLiteral `'` IdentifierPart

(IMO, : is the best of the two)

domenic commented 6 years ago

See https://github.com/tc39/proposal-extended-numeric-literals/issues/2#issuecomment-361384633

jfbastien commented 6 years ago

Have we discussed using Unicode (either for digit separator, or extended numeric literal delimiter)? So many good possibilities in Unicode.

littledan commented 6 years ago

@rwaldron I share @domenic's concern with shadowing. It can come up with nested scopes and const, not just with var. Any mangling scheme would break the property of lexical scoping that you can textually search for all usages given a definition in a direct way. At the same time, $ looks pretty ugly...

je4d commented 6 years ago

@rwaldron is entirely correct about the C++ situation, _ was considered the preferred token for a digit separator, it seemed visually acceptable to people at the time, and importantly it didn't require changes to the preprocessor because _ is part of the pp-number grammar.

UDLs were already part of the standard by that point, which _ as a digit separator creates ambiguities with. The ambiguities are especially bad with hex literals, i.e. is 0x1234_b would not invoke a _b for some number of bytes. The proposed disambiguation was 0x1234.._b, which in the end wasn't a good enough answer to get consensus on _. That was at the early 2013 meeting.

Some further analysis showed that ' was a viable option, even though it required a bit more cunning to actually specify, and that got adopted a couple of meetings later.

rwaldron commented 6 years ago

@je4d thanks for that additional background!

samuelgoto commented 6 years ago

thanks for the background!

FWIW, I'm open to revisiting the specific digit we use for numeric separators, if there is a cross cutting concern with this specific feature here. as @rwaldron mentioned, i think the trade-offs are currently well-balanced (in that consistency with other languages do play a big role), but lmk if you think we are cornering ourselves there.

littledan commented 6 years ago

Well, _ definitely looks like as a digit separator, but I don't yet see a clear way out which makes everything line up beautifully. We can discuss various alternatives in other issues.

rbuckton commented 6 years ago

@rwaldron

(IMO, : is the best of the two)

However, in light of the "slice notation" proposal, this becomes ambiguous: o[3:px]. I do kind of like ' as an option.

@domenic

See #2

I agree that shadowing is a concern, but its not a new concern. Regarding your example from that thread:

const { px } = CSS;

// later

const px = figureOutPixelsForBorderWidth();
setBorderWidth(px);

// later, but still within the same scope
setHeight(5px); // tries to call the wrong px

As the developer you could easily chose a different name for either variable, i.e.: const { px: pixels } = CSS; or const pixels = figureOutPixelsForBorderWith();. If you unintentionally shadowed px you'll get a runtime error or things won't work quite as expected, but that's always a possibility with shadowing even outside of this proposal.

If you consider this to be an untenable solution, another option is to require a double-underscore prefix, since double-underscore is disallowed in numeric literal separators, i.e. const { px: __px } = CSS; and setHeight(5__px). Yes, 5__px is less appealing than 5_px, but it solves the ambiguity.

rwaldron commented 6 years ago

@rbuckton I'm all for '

Also...

I agree that shadowing is a concern, but its not a new concern.

And the _ prefix does absolutely nothing to prevent shadowing.

Yes, 5__px is less appealing than 5_px, but it solves the ambiguity.

I believe this is safe, I would want to check the Babel/Babylon parser to prove it.

littledan commented 6 years ago

And the _ prefix does absolutely nothing to prevent shadowing.

What _ does to prevent shadowing is make it significantly less likely to run into in practice, based on a look at a number of likely suffix names like i and px. ' isn't a valid character in an identifier, so I don't understand what the suggestion is.

matthew-dean commented 6 years ago

' seems like it would make parsing, linting, code-colouring a lot more difficult task.

hax commented 6 years ago

@matthew-dean

' seems like it would make parsing, linting, code-colouring a lot more difficult task.

As I understand, ' would be the part of numeric token, so it wouldn't make parsing, linting more difficult. About code-coloring, it would possible because most syntax highlighter use regexp instead of full scanner/parser. But I always think the issue of the tools are secondary.

hax commented 6 years ago

I agree with @rwaldron and @rbuckton. I have to say force _ prefix in the userland code to avoid shadowing issue is weird and not very javascript-style in my personally feeling.

I'm curious why namespace-based solution like import {literal i} from "imaginary" is rejected. I much prefer that way, because it totally eliminate the shadowing problem.

peey commented 5 years ago

@littledan I don't think we should worry a lot about the separator being a part of the suffix identifier as a way to minimize clashes. For one, it breaks the beautiful:

let { px } = CSS;

For another, I think it's confusing to have arbitrary restrictions on identifier names for them to be usable in a particular context. Also it creates cognitive load on the user who would otherwise freely create _identifier and make them think that names starting from _ should be reserved for extended numeric literals.

Many users will just be able to use im for imaginary numbers, meters for meters but cm for centimeters and so on. And if the separator is different from $ or _ then actually it helps the users, who can use _m for meters, making the whole literal read as 1'_m or maybe 1 _m

littledan commented 5 years ago

This current proposal does have _ as part of the identifier. Do you have another idea for how to avoid the likely clash with i?

zenparsing commented 5 years ago

C# deals with this problem in "attributes" by appending "Attribute" to the identifier found in the source text.

littledan commented 5 years ago

I think concatenating and cooking up new local names would go against some of the committee's goals about the integrity of lexical scoping.

zenparsing commented 5 years ago

I see that you've addressed "name mangling" previously (apologies for not having the context), but I'm not quite sure what the objection is.

Following the C# attributes model, I would imagine that it would work like this:

const { pxUnits } = CSS;
setHeight(5px); // References "pxUnits" in scope

This conveniently avoids both the overlap with numeric separators and local name shadowing issues, and the technique has well established prior art in C#.

I think concatenating and cooking up new local names would go against some of the committee's goals about the integrity of lexical scoping.

Why would this violate the integrity of lexical scoping?

peey commented 5 years ago

This current proposal does have _ as part of the identifier. Do you have another idea for how to avoid the likely clash with i?

My argument was to offload the burden of declashing to users and library authors, and keeping the spec simple. i.e. where clash can be avoided they'll use some other identifier, for instance im for imaginary units, and where clashes are anticipated they'll use _m for meters or _px for pixels. This will work if separator is something other than $ or _.

But it might be a naive argument.

littledan commented 5 years ago

Cc @erights @waldemarhorwat re integrity of lexical scoping.

littledan commented 5 years ago

Note, it is more than just library authors, e.g. when you import it, you need to write the name.

waldemarhorwat commented 5 years ago

What's the question about integrity of lexical scoping?

littledan commented 5 years ago

@waldemarhorwat Specifically, if we made 3px desugar into prefixpx(3), and asked you to import px as import {prefixpx} from "mymodule", would this break the expectations that you have when you see 3px and think about px as something that is related to a lexically scoped variable?

waldemarhorwat commented 5 years ago

A different way of spelling an identifier like what's proposed here has no effect on the integrity of lexical scoping.

To illustrate, an example of something that would have an effect on the integrity of lexical scoping would be a feature that lets you access a lexically scoped variable from outside the scope or a feature that lets an outsider change the mapping of lexically scoped references to their definitions like what with and some versions of eval do.

What I see here is a question about usability. Will enough users realize that the px in 3px is just another spelling of the identifier prefixpx? That's an interesting question that I don't have a definitive answer to.

erights commented 5 years ago

E used a name mangling technique like this in several places, where a use of a simple identifier in a funny syntactic context desugared to naming a mangled lexical variable name with the original identifier as a prefix, and a funny suffix specific to the funny syntactic context. For example, the E quasi-literals, which became the JS template literals, would expand

tag`str`

into

tag__quasiParser.substitute("str", [])

These suffixes always began with __, and so would never conflict with an identifier lacking a double underscore. Given the special treatment of __proto__ in some contexts, perhaps that double underscore rule is not a bad generalization.

In any case, I think we should either use the exact identifier the user wrote as the lexical variable name, or if we mangle, we should adopt a convention for this and all future mangles such that users can easily know how to chose names that never conflict with any possible mangled name.

So, for this case, px or prefix__px or px__suffix would be acceptable, but prefixpx would not.

erights commented 5 years ago

Historical only note:

We did not suggest a mangling scheme for template literal tags only because there was no precedent in JS for such mangling, and we did not want to start down that road at that time. If we do end up with such mangling elsewhere in JS, I will regret that template literal tags are not mangled.

littledan commented 5 years ago

Great, I am glad I was misunderstanding. This is all pointing towards a solution which will allow numeric separators to continue to use _! I will think on this more. Cc @leobalter @samuelgoto @rwaldron

tabatkins commented 5 years ago

For one, it breaks the beautiful:
let { px } = CSS; 

That won't work anyway, as CSS.px is designed for direct human usage and just takes a single number, rather than being designed for unit usage and taking the string/etc arguments.

Whatever name-mangling is decided here, I anticipate adding the mangled names to the CSS namespace as well, so they can be imported easily. (And maybe have an easy installer for them? Or just prominently recommend doing Object.assign(window, CSS.units) in the spec, dunno.)

littledan commented 5 years ago

@tabatkins I like the idea of enabling programmers to not have to reference the mangled names in that sort of way, but at that point, why not just put them in the global scope to start with? partial interface mixin WindowOrWorkerGlobalScope { ... } and all that jazz.

tabatkins commented 5 years ago

We def could, but CSS already has 30+ units (see https://drafts.css-houdini.org/css-typed-om/#numeric-factory), and we add more semi-regularly. There's no guarantee they won't clash with other builtins the web platform might want.

tc39 / proposal-extended-numeric-literals

Digit separators #8