sg16-unicode / sg16

SG16 overview and general information
45 stars 5 forks source link

Migration challenges following P1949 changes to identifier syntax #79

Open tahonermann opened 2 years ago

tahonermann commented 2 years ago

WG21 adopted P1949 as a defect report during the June, 2021 virtual plenary. Since then, there have been several reports from programmers that have been using identifiers that are valid under the immutable identifier syntax previously used, but are rejected under the default identifier syntax rules now being used.

Gcc implemented P1949 for the gcc 12 release but only diagnoses identifiers that are no longer permitted when invoked in -pedantic mode. See https://godbolt.org/z/4aqbzfEf1.

Clang implemented P1949 for the Clang 14 release and diagnoses no longer permitted identifiers as an error with no option to downgrade or disable the error. See https://godbolt.org/z/hK4oWMsnP.

MSVC does not appear to have implemented P1949 as of its 19.32 release. See https://godbolt.org/z/Pve5chqEW.

A few Clang users have reported impact to their existing projects to the Clang maintainers at https://github.com/llvm/llvm-project/issues/54732. Clang maintainers have not yet made a change to provide a backward compatibility option but are leaning towards changing the diagnostic to a warning-defaults-to-error mode so that it can be suppressed.

Tom has been meeting with the Unicode Source Code Ad Hoc Group (USCAHG: a group within the Unicode Consortium formed following the publication of the Trojan Source suite of vulnerabilities; see Unicode document L2/22-007) to discuss these and other issues related to source code as text. The group will be reviewing character allowances in place for default identifier syntax and likely revising them for a future Unicode Standard to include some characters, like mathematical symbols, that are permitted in immutable identifier syntax but not currently in default identifier syntax. An initial survey of such characters is available in Unicode document L2/22-102.

Clang maintainers are aware of the USCAHG plans and, specifically, that a future Unicode Standard may expand the allowances in default identifier syntax such that some identifiers that were previously permitted prior to P1949 and that were no longer permitted afterwards will again be permitted at some point in the future. The maintainers want to avoid having to maintain options that enable/disable sets of identifiers over time.

Robin Leroy (one of the USCAHG chairs) suggested that it may be helpful to C++ implementors to publish a WG21 paper to inform them of this situation and to offer advice regarding how they may best assist their users in migrating to the new identifier syntax. This issue has been filed to follow up on this suggestion.

tahonermann commented 2 years ago

One report of negative impact came from Dennis Ogiermann via a post to the SG16 mailing list on August 11th, 2022 (on request after having first filed SG16 issue #77). In that post, Dennis presents a case for continuing to allow mathematical symbols and superscript and subscript letters, numbers, and symbols in identifiers.

Dennis has also been participating in the previously linked Clang issue. At the time of this writing, it appears that four Clang users have voiced concerns there.