php / doc-en

English PHP documentation
508 stars 738 forks source link

Constant Name Regular Expression Note #2772

Open Sunthief opened 1 year ago

Sunthief commented 1 year ago

From manual page: https://php.net/language.constants


I think the note concerning the possible names of constants is incorrect.

Note: For our purposes here, a letter is a-z, A-Z, and the ASCII characters from 128 through 255 (0x80-0xff).

As fair as my knowledge goes, the x80-xFF is not ASCII/UTF code points, but applies to any bytes from 10000000 to 11111111. AT least that is how it works with variables and other names. I also tested it and emojis as names work fine.

damianwadley commented 1 year ago

The note sounds weird to me, and is out of place given it's related to the "The name of a constant..." sentence a couple paragraphs up, but what it is trying to say is correct: the bytes 0x80-0xFF are allowed. Because PHP doesn't really care about character encoding in source files: accented characters, emoji, whatever, it's the actual bytes that matter and affect validity.

And the precise range of what "ASCII" covers probably depends on the person: it's mostly 0-127, sure, but 128-255 is part of the "extended" range so they kinda count too.

I think the source of confusion here is going to be mostly around the use of the term "characters". That should be dropped entirely and the unambiguous term "bytes" used instead. But my choice would be to remove this note entirely and rephrase the earlier paragraph to something along the lines of

The name of a constant follows the same rules as any label in PHP. A valid constant name starts with an ASCII letter or underscore, followed by any number of ASCII letters, numbers, or underscores. The bytes 0x80 through 0xFF, used by character encodings like UTF-8 and the ISO 8859 family, are allowed anywhere as well. As a regular expression, it would be expressed thusly: ^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$

Perhaps something even more than that which calls out PHP's encoding-agnosticism?

Additionally, the Variables > Basics page and user-defined functions page repeat similar statements. May be another page or two I'm not thinking of too.

Sunthief commented 1 year ago

You might want to add to your paragraph that this means emojis and accented characters work as well, in case this is not clear enough.

damianwadley commented 1 year ago

You might want to add to your paragraph that this means emojis and accented characters work as well, in case this is not clear enough.

Can we please not tell people that emoji are supported in names?

Sunthief commented 1 year ago

I think it is something that most would agree is not advisable, dont get me wrong. It still might be good to know to understand the concept.