tc39 / proposal-extended-numeric-literals

Extensible numeric literals for JavaScript
https://tc39.github.io/proposal-extended-numeric-literals/
72 stars 18 forks source link

Separate user/host defined literals from ES-defined ones (e.g. 123_px vs. 123n) #2

Open domenic opened 6 years ago

domenic commented 6 years ago

One problem people bring up with this proposal is that it prevents TC39 from ever adding new literals to the language.

There are a few solutions to this. E.g. maybe user-defined ones always override the built-in ones. But one I haven't seen mentioned is the one used by C++, where user-defined literals need a separator (123_px). Only the language spec can define ones with no separator. Thus there's no collision.

I think hosts would also use this separator for their literals (e.g. CSS typed OM defining a px literal), as that way hosts stay "just libraries" in some sense.

littledan commented 6 years ago

This is a good idea; I think that was a sort of killer issue when I presented this. However, it goes exactly contrary to another goal, which would be to allow polyfilling built-in things. I'm not sure how to square that circle.

tabatkins commented 6 years ago

So the current proposal requires a _ prefix on the userland suffixes, which seems to rub some people the wrong way.

What if we just reserved all the single-letter suffixes for JS (52 of them in ASCII), and let userland have everything else?

domenic commented 6 years ago

The problem there is that then lexical variables as the lookup mechanism become much less feasible; it means that in programs that use units, you can no longer have variables named (e.g.) px, because then you'll get shadowing conflicts. Concrete example:

const { px } = CSS;

// later

const px = figureOutPixelsForBorderWidth();
setBorderWidth(px);

// later, but still within the same scope
setHeight(5px); // tries to call the wrong px

This is especially bad with loop indices (i, j, etc.), although your reserve-single-character-prefixes sidesteps that.

The _ prefix is IMO an elegant way of segregating built-ins from user-defined, and also solves the scoping overlap issues without resorting to the many strange ideas considered previously (see https://github.com/tc39/proposal-extended-numeric-literals/tree/4902dc6d7da56d7572cd1999f71203fb54275755#scoping-variants).

tabatkins commented 6 years ago

Ah, that's a very reasonable objection. Possible workaround - the userland literal syntax is just "any 2+ ident chars", but the lookup is for the ident mangled in a specific way (perhaps with a "_" prefix, or something more explicit).

(Ah, I see that's already covered by the mangling idea you in the history you link to. Still might be worth considering - the cost of mangling during definition/import vs the cost of an extra char at every invocation is a non-obvious balance.)

littledan commented 6 years ago

A downside of mangling is that programmers have to be very aware of the mangling scheme, e.g. when importing the suffix.

rwaldron commented 6 years ago
const { px } = CSS;

// later

const px = figureOutPixelsForBorderWidth();
setBorderWidth(px);

// later, but still within the same scope
setHeight(5px); // tries to call the wrong px

This specific example is actually a SyntaxError, because px has already been declared (this would apply to let bindings as well, which also cannot be redeclared.) So, this would only be a problem when var is used? Adding the _ prefix doesn't make this problem go away, since it's still just a valid identifier:

// Using var because that's the only way this example actually holds up...
var { _px } = CSS;

// later

var _px = figureOutPixelsForBorderWidth();

// later, but still within the same scope
setHeight(5_px); // tries to call the wrong px
domenic commented 6 years ago

Yes, that example is a syntax error. The issue is that it wouldn't be a syntax error if suffixes required underscores:

const { _px } = CSS;

// later

const px = figureOutPixelsForBorderWidth();
setBorderWidth(px);

// later, but still within the same scope
setHeight(5_px);
rwaldron commented 6 years ago

The issue is that it wouldn't be a syntax error if suffixes required underscores:

Follow me...

The lexical grammar as it is:

NumericLiteral `_` IdentifierPart

(which I believe is wrong anyway)

The example given in the readme:

3_px desugars into _px(Object.freeze({number: 3, string, "3"}))

Assume that IdentifierName is the grammar, and that a program has some user-defined literal _px (which is an IdentifierName): the _ of _px is not the _ part of the ExtendedNumericLiteral—it's the first character of the IdentifierName _px! That means, to respect the grammar, you'd actually write 3__px. This why I'm saying that the grammar defined _ is pointless.

There's no reason why this couldn't simply be:

ExtendedNumericLiteral ::     NumericLiteral IdentifierName

And your exact program above would work exactly the same way.

domenic commented 6 years ago

It seems there may be some bugs in the specification---I defer to @littledan there---but hopefully you can understand the point I'm trying to make, where by requiring the identifier to be _px for the literal-function, we allow the program to still contain other identifiers named px with no collision.

littledan commented 6 years ago

Thanks for explaining the grammar bug, @rwaldron.

I'm fine with changing to a different scheme for choosing whether a suffix is built in or not. As @not-an-aardvark explained in https://github.com/tc39/proposal-extended-numeric-literals/issues/7 , the scheme here doesn't even work due to numeric separator and hex literals. Can we discuss here what other scheme might be better?

rwaldron commented 6 years ago

by requiring the identifier to be _px for the literal-function, we allow the program to still contain other identifiers named px with no collision.

That doesn't hold up because it doesn't prevent code in the same scope from declaring an Identifier whose first character is _; demonstrated with var declarations:

var { _px } = CSS;

// later

var _px = figureOutPixelsForBorderWidth();
setBorderWidth(_px);

// later, but still within the same scope
setHeight(5_px);
domenic commented 6 years ago

The idea is that it's not great if a useful library feature (e.g. the CSS library) prevents you from using the variable name px, but it's fine if using that library feature prevents you from using the name _px.

littledan commented 6 years ago

Agree with @domenic that things are likely to be more OK in practice with _. However, I don't know a good ssolution to the numeric separator ambiguity issue.

not-an-aardvark commented 6 years ago

How is the CSS library preventing you from using the variable name px? You could always use a different name:

var { px: pixels } = CSS;

var px = somethingElse;

setHeight(5_pixels);

I don't see how this would be different from any other property on an host object, such that it would warrant only making the feature compatible with identifiers that have particular names.

rwaldron commented 6 years ago

@littledan

Can we discuss here what other scheme might be better?

I suggested above:

ExtendedNumericLiteral ::     NumericLiteral IdentifierName

I don't think there needs to be any differentiation between user-defined and built-in ExtendedNumericLiteral.


@not-an-aardvark I was just about to suggest: var { px: _px } = CSS; :)

littledan commented 6 years ago

@rwaldron This issue would come up with let or const declarations as well--they just need to be in a nested scope. An overlap would be particularly probable if anyone uses, e.g., an i suffix (complex numbers?). If we don't start the name with a prefix like _, we may need some other solution such as mangling, a separate namespace, or using a property of a shared object (all have complexity and downsides).

littledan commented 6 years ago

I don't think there needs to be any differentiation between user-defined and built-in ExtendedNumericLiteral.

Do you think local overrides of a variable named n should change the interpretation of BigInt literals?

not-an-aardvark commented 6 years ago

This issue would come up with let or const declarations as well--they just need to be in a nested scope. An overlap would be particularly probable if anyone uses, e.g., an i suffix (complex numbers?).

Wouldn't this already be a problem if someone is using a function call?

import { i } from 'complex-number-library';

for (let i = 0; i < 10; i++) {
  const fiveTimesI = i(5); // error!
}

It seems like the problem of variable overlap is inherent with all variable accesses, and isn't specific to this feature. The problem is only notable in this case because our examples happen to be using variable names which are very short, increasing the likelihood of a collision. The solution would be to simply use longer variable names.

const distance = 5_m; // inadvisable

const distance = 5_meters; // ok
not-an-aardvark commented 6 years ago

Do you think local overrides of a variable named n should change the interpretation of BigInt literals?

I realize the question wasn't aimed at me, but that seems counter-intuitive to me -- most polyfills work by overriding an existing global, but BigInt literals are not implemented by setting global.n.

tabatkins commented 6 years ago

but BigInt literals are not implemented by setting global.n.

Yet, but if we allow userspace literals to live in the same namespace as language-defined ones, then BigInt would get back-explained as being implemented by an implicit global.n. (Probably not observable, but still shadow-able by userspace.)

tabatkins commented 6 years ago

Wouldn't this already be a problem if someone is using a function call? [example using i as a looping var, plus i as a complex suffix]

Yes, and that's the point - the attractive single-letter suffixes we're gonna want for future language-defined features will often clash with userspace variables accidentally, which suggests that we don't want them to be trivially overridable. Either the language-defined things carve out some of the syntax space for themselves, or mangling goes on, or something else preventing common variables from overriding desirable short suffixes.

not-an-aardvark commented 6 years ago

I agree that name clashes with future language extensions are a problem. I am arguing that if we use a syntax which distinguishes between language-provided and user-provided suffixes, then we shouldn't unnecessarily restrict the syntax to avoid name clashes between local variables.

Separately, the idea of setting global.n to a function seems unappealing to me, since it's a very confusing name when used anywhere other than after a numeric literal.

rwaldron commented 6 years ago

A little late in getting back to this, but @not-an-aardvark appears to have the salient points well covered.

@littledan

Do you think local overrides of a variable named n should change the interpretation of BigInt literals?

Of course not and I'm surprised that this was the leap you made from me saying "I don't think there needs to be any differentiation between user-defined and built-in ExtendedNumericLiteral" which is a statement that is explicitly, exclusively about the syntax, not the semantics, and not what happens when user code does var n = "gotcha".

samuelgoto commented 6 years ago

If we don't start the name with a prefix like _, we may need some other solution such as mangling, a separate namespace, or using a property of a shared object (all have complexity and downsides).

Can you give me an idea of what "separate namespace" or "using a property on a shared object" be like? Are we talking about something like:

global[Symbol("custom-literal-i")] = function() {
  // foo bar, custom literal
}

let a = 123i;
// equivalent to let a = global[Symbol("custom-literal-i")]({value: "123", number: 123});

Is so, what are the trade-offs here?

What would namespaces be like and what would their trade-offs be?