mycoboco / beluga

a standard C compiler (with an integrated preprocessor)
http://code.woong.org/beluga
Other
65 stars 8 forks source link

use locale info to preset input character set #132

Closed mycoboco closed 6 years ago

mycoboco commented 6 years ago

The format of LC_CTYPE is

[language[_territory][.codeset][@modifier]]

and code to parse it should be

if (ctypel && ctype has .) {
    ctype = ctype.substring(ctype.indexOf('.') + 1);
    return (ctype has @)? ctype.substring(0, ctype.indexOf('@')): ctype;
}
return not found;

How gcc handles environment variables:

LANG
LC_CTYPE
LC_MESSAGES
LC_ALL
    These environment variables control the way that GCC uses localization information which allows GCC to work with
    different national conventions.  GCC inspects the locale categories LC_CTYPE and LC_MESSAGES if it has been
    configured to do so.  These locale categories can be set to any value supported by your installation.  A typical
    value is en_GB.UTF-8 for English in the United Kingdom encoded in UTF-8.

    The LC_CTYPE environment variable specifies character classification.  GCC uses it to determine the character
    boundaries in a string; this is needed for some multibyte encodings that contain quote and escape characters that
    are otherwise interpreted as a string end or escape.

    The LC_MESSAGES environment variable specifies the language to use in diagnostic messages.

    If the LC_ALL environment variable is set, it overrides the value of LC_CTYPE and LC_MESSAGES; otherwise, LC_CTYPE
    and LC_MESSAGES default to the value of the LANG environment variable.  If none of these variables are set, GCC
    defaults to traditional C English behavior.
mycoboco commented 6 years ago

The encoding determined from those variables should be used for input and execution character sets.