rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.32k stars 12.72k forks source link

Variable named `∇x` gives "unknown start of token" compiler error #120142

Open Danvil opened 9 months ago

Danvil commented 9 months ago

I tried this code:

let ∇x = 1;

I expected the code to compiles but instead I get the compiler error message "unknown start of token \u{2207}".

This is surprising as variable names starting with Greek letters are fine:

let Δλ = 1;

I believe the cause is that Rust identifiers need to start with a XID_Start unicode characters, however the "Nabla" ∇ (0x2207) does not seem to be on that list.

It would be great to have the "Nabla" operator as a valid start token for identifier as it very commonly used in physics and mathematics to denote the derivative of a multi-variable function.

A possible workaround is to use the "Canadian syllabics e" ᐁ (0x1401).

Jules-Bertholet commented 9 months ago

See https://github.com/rust-lang/rfcs/issues/3402 and https://www.unicode.org/reports/tr31/proposed.html#Mathematical_Compatibility_Notation_Profile

@rustbot label T-lang -C-bug C-enhancement

Manishearth commented 9 months ago

This would need an RFC to extend the current identifier profile (the default one from UAX 31) to use the mathematical notation profile.

This would add these characters to the identifier profile (with the superscripts and subscripts not being allowed at the beginning of an identifier)

All of these would get linted on by the uncommon_codepoints lint since they have Identifier_Type=Not_NFKC.

(A change I want to make is for uncommon_codepoints to have slightly different lint text based on the category that is triggered: https://github.com/rust-lang/rust/issues/120228)

bend-n commented 9 months ago

Hey if were getting mathematical characters can i just say i would really love it if we had ¬ is there a unicode profile for these?

Also what about the emoji profile 😀

Manishearth commented 9 months ago

There isn't for the math operators because those are considered operatorlike.

As for emoji it's unlikely. Rust would have to put together its own set.

Emoji identifiers are a complicated can of worms.

CraftSpider commented 9 months ago

Personally I'd be interested in math operators at least tokenizing, so they could reach macros as Punct or such, but I figure that's somewhat unlikely to actually happen.