Open domenic opened 6 years ago
This is a good idea; I think that was a sort of killer issue when I presented this. However, it goes exactly contrary to another goal, which would be to allow polyfilling built-in things. I'm not sure how to square that circle.
So the current proposal requires a _ prefix on the userland suffixes, which seems to rub some people the wrong way.
What if we just reserved all the single-letter suffixes for JS (52 of them in ASCII), and let userland have everything else?
The problem there is that then lexical variables as the lookup mechanism become much less feasible; it means that in programs that use units, you can no longer have variables named (e.g.) px
, because then you'll get shadowing conflicts. Concrete example:
const { px } = CSS;
// later
const px = figureOutPixelsForBorderWidth();
setBorderWidth(px);
// later, but still within the same scope
setHeight(5px); // tries to call the wrong px
This is especially bad with loop indices (i
, j
, etc.), although your reserve-single-character-prefixes sidesteps that.
The _
prefix is IMO an elegant way of segregating built-ins from user-defined, and also solves the scoping overlap issues without resorting to the many strange ideas considered previously (see https://github.com/tc39/proposal-extended-numeric-literals/tree/4902dc6d7da56d7572cd1999f71203fb54275755#scoping-variants).
Ah, that's a very reasonable objection. Possible workaround - the userland literal syntax is just "any 2+ ident chars", but the lookup is for the ident mangled in a specific way (perhaps with a "_" prefix, or something more explicit).
(Ah, I see that's already covered by the mangling idea you in the history you link to. Still might be worth considering - the cost of mangling during definition/import vs the cost of an extra char at every invocation is a non-obvious balance.)
A downside of mangling is that programmers have to be very aware of the mangling scheme, e.g. when importing the suffix.
const { px } = CSS; // later const px = figureOutPixelsForBorderWidth(); setBorderWidth(px); // later, but still within the same scope setHeight(5px); // tries to call the wrong px
This specific example is actually a SyntaxError, because px
has already been declared (this would apply to let
bindings as well, which also cannot be redeclared.) So, this would only be a problem when var
is used? Adding the _
prefix doesn't make this problem go away, since it's still just a valid identifier:
// Using var because that's the only way this example actually holds up...
var { _px } = CSS;
// later
var _px = figureOutPixelsForBorderWidth();
// later, but still within the same scope
setHeight(5_px); // tries to call the wrong px
Yes, that example is a syntax error. The issue is that it wouldn't be a syntax error if suffixes required underscores:
const { _px } = CSS;
// later
const px = figureOutPixelsForBorderWidth();
setBorderWidth(px);
// later, but still within the same scope
setHeight(5_px);
The issue is that it wouldn't be a syntax error if suffixes required underscores:
Follow me...
The lexical grammar as it is:
NumericLiteral `_` IdentifierPart
(which I believe is wrong anyway)
The example given in the readme:
3_px
desugars into_px(Object.freeze({number: 3, string, "3"}))
Assume that IdentifierName is the grammar, and that a program has some user-defined literal _px
(which is an IdentifierName): the _
of _px
is not the _
part of the ExtendedNumericLiteral—it's the first character of the IdentifierName _px
! That means, to respect the grammar, you'd actually write 3__px
. This why I'm saying that the grammar defined _
is pointless.
There's no reason why this couldn't simply be:
ExtendedNumericLiteral ::
NumericLiteral IdentifierName
And your exact program above would work exactly the same way.
It seems there may be some bugs in the specification---I defer to @littledan there---but hopefully you can understand the point I'm trying to make, where by requiring the identifier to be _px
for the literal-function, we allow the program to still contain other identifiers named px
with no collision.
Thanks for explaining the grammar bug, @rwaldron.
I'm fine with changing to a different scheme for choosing whether a suffix is built in or not. As @not-an-aardvark explained in https://github.com/tc39/proposal-extended-numeric-literals/issues/7 , the scheme here doesn't even work due to numeric separator and hex literals. Can we discuss here what other scheme might be better?
by requiring the identifier to be _px for the literal-function, we allow the program to still contain other identifiers named px with no collision.
That doesn't hold up because it doesn't prevent code in the same scope from declaring an Identifier whose first character is _
; demonstrated with var
declarations:
var { _px } = CSS;
// later
var _px = figureOutPixelsForBorderWidth();
setBorderWidth(_px);
// later, but still within the same scope
setHeight(5_px);
The idea is that it's not great if a useful library feature (e.g. the CSS library) prevents you from using the variable name px, but it's fine if using that library feature prevents you from using the name _px.
Agree with @domenic that things are likely to be more OK in practice with _
. However, I don't know a good ssolution to the numeric separator ambiguity issue.
How is the CSS library preventing you from using the variable name px
? You could always use a different name:
var { px: pixels } = CSS;
var px = somethingElse;
setHeight(5_pixels);
I don't see how this would be different from any other property on an host object, such that it would warrant only making the feature compatible with identifiers that have particular names.
@littledan
Can we discuss here what other scheme might be better?
I suggested above:
ExtendedNumericLiteral
::
NumericLiteral IdentifierName
var
declarations, and that's something JS programmers have lived with for over 20 years. I don't think there needs to be any differentiation between user-defined and built-in ExtendedNumericLiteral.
@not-an-aardvark I was just about to suggest: var { px: _px } = CSS;
:)
@rwaldron This issue would come up with let or const declarations as well--they just need to be in a nested scope. An overlap would be particularly probable if anyone uses, e.g., an i suffix (complex numbers?). If we don't start the name with a prefix like _, we may need some other solution such as mangling, a separate namespace, or using a property of a shared object (all have complexity and downsides).
I don't think there needs to be any differentiation between user-defined and built-in ExtendedNumericLiteral.
Do you think local overrides of a variable named n should change the interpretation of BigInt literals?
This issue would come up with let or const declarations as well--they just need to be in a nested scope. An overlap would be particularly probable if anyone uses, e.g., an i suffix (complex numbers?).
Wouldn't this already be a problem if someone is using a function call?
import { i } from 'complex-number-library';
for (let i = 0; i < 10; i++) {
const fiveTimesI = i(5); // error!
}
It seems like the problem of variable overlap is inherent with all variable accesses, and isn't specific to this feature. The problem is only notable in this case because our examples happen to be using variable names which are very short, increasing the likelihood of a collision. The solution would be to simply use longer variable names.
const distance = 5_m; // inadvisable
const distance = 5_meters; // ok
Do you think local overrides of a variable named n should change the interpretation of BigInt literals?
I realize the question wasn't aimed at me, but that seems counter-intuitive to me -- most polyfills work by overriding an existing global, but BigInt literals are not implemented by setting global.n
.
but BigInt literals are not implemented by setting global.n.
Yet, but if we allow userspace literals to live in the same namespace as language-defined ones, then BigInt would get back-explained as being implemented by an implicit global.n
. (Probably not observable, but still shadow-able by userspace.)
Wouldn't this already be a problem if someone is using a function call? [example using i as a looping var, plus i as a complex suffix]
Yes, and that's the point - the attractive single-letter suffixes we're gonna want for future language-defined features will often clash with userspace variables accidentally, which suggests that we don't want them to be trivially overridable. Either the language-defined things carve out some of the syntax space for themselves, or mangling goes on, or something else preventing common variables from overriding desirable short suffixes.
I agree that name clashes with future language extensions are a problem. I am arguing that if we use a syntax which distinguishes between language-provided and user-provided suffixes, then we shouldn't unnecessarily restrict the syntax to avoid name clashes between local variables.
Separately, the idea of setting global.n
to a function seems unappealing to me, since it's a very confusing name when used anywhere other than after a numeric literal.
A little late in getting back to this, but @not-an-aardvark appears to have the salient points well covered.
@littledan
Do you think local overrides of a variable named n should change the interpretation of BigInt literals?
Of course not and I'm surprised that this was the leap you made from me saying "I don't think there needs to be any differentiation between user-defined and built-in ExtendedNumericLiteral" which is a statement that is explicitly, exclusively about the syntax, not the semantics, and not what happens when user code does var n = "gotcha"
.
If we don't start the name with a prefix like _, we may need some other solution such as mangling, a separate namespace, or using a property of a shared object (all have complexity and downsides).
Can you give me an idea of what "separate namespace" or "using a property on a shared object" be like? Are we talking about something like:
global[Symbol("custom-literal-i")] = function() {
// foo bar, custom literal
}
let a = 123i;
// equivalent to let a = global[Symbol("custom-literal-i")]({value: "123", number: 123});
Is so, what are the trade-offs here?
What would namespaces be like and what would their trade-offs be?
One problem people bring up with this proposal is that it prevents TC39 from ever adding new literals to the language.
There are a few solutions to this. E.g. maybe user-defined ones always override the built-in ones. But one I haven't seen mentioned is the one used by C++, where user-defined literals need a separator (
123_px
). Only the language spec can define ones with no separator. Thus there's no collision.I think hosts would also use this separator for their literals (e.g. CSS typed OM defining a
px
literal), as that way hosts stay "just libraries" in some sense.