mstorsjo / llvm-mingw

An LLVM/Clang/LLD based mingw-w64 toolchain
Other
1.92k stars 186 forks source link

-fno-short-wchar: Inconsistent wchar_t Definition with L-Prefix Strings #461

Open open-leocat opened 5 days ago

open-leocat commented 5 days ago

I encountered an issue with the -fno-short-wchar flag while using the latest release of Clang MinGW on Windows. This flag should ensure that wchar_t is 4 bytes instead of 2 bytes (which forces UTF-16 encoding). On Linux, using Clang 19, this flag behaves as expected.

However, in the newest release of Clang MinGW on Windows, the flag does not fully change the definition of wchar_t. While the flag causes the compiler to interpret L-prefixed strings as arrays of int (i.e., 4 bytes per character), the wchar_t type remains 2 bytes, as defined by unsigned short in the standard library. This inconsistency leads to a compiler warning and causes this flag to break existing code when working with L-prefixed strings.

Test Code

#include <wchar.h>

int main() {
    wchar_t* string = L"a";

    return 0;
}

Compiler Output

main.c:5:11: warning: incompatible pointer types initializing 'wchar_t *' (aka 'unsigned short *') with an expression of type 'int[2]' [-Wincompatible-pointer-types]
    4 |         wchar_t* string = L"f";
      |                  ^        ~~~~
1 warning generated.

Problematic Lines in the Standard Library (corecrt.h, line 95-100)

#ifndef _WCHAR_T_DEFINED
#define _WCHAR_T_DEFINED
#if !defined(__cplusplus) && !defined(__WIDL__)
typedef unsigned short wchar_t;
#endif /* C++ */
#endif /* _WCHAR_T_DEFINED */

Summary

Excepted behaviour

The -fno-short-wchar flag should properly redefine wchar_t as 4 bytes across the entire system, including the standard library, to prevent mismatches.

Environment

open-leocat commented 5 days ago

I just realized that this is not actually the MinGW repository, but simply a repository for building the LLVM-MinGW-Compiler combination thingy.

However, this still is a Clang and UCRT incompatiblity, so I am not sure whether this should actually be fixed, or whether it is acceptable that the feature is broken.

All this is also caused by Microsoft being retarded and opting for UTF-16 instead of something decent, like UTF-8 or UTF-32 and hardcoding it retardedly into the UCRT. Maybe this could be patched, but I am sure that all the wchar functions would also be unfunctional and would need to be patched. Thus one should simply pray that one day the C23 UTF-8 functions will be properly implemented.

I will reopen this issue, so maybe someone else will decide what to do with this finding.

open-leocat commented 4 days ago

After playing around with this flag for a bit, I have also found a conflict between stddef.h and corecrt.h, which happened when importing stdint.h:

In file included from C:/Program Files/LLVM/lib/clang/19/include/stdint.h:56:
In file included from C:/Program Files/LLVM/include/stdint.h:32:
In file included from C:/Program Files/LLVM/lib/clang/19/include/stddef.h:103:
C:/Program Files/LLVM/lib/clang/19/include/__stddef_wchar_t.h:24:24: error: typedef redefinition with different types ('int' vs 'unsigned short')
   24 | typedef __WCHAR_TYPE__ wchar_t;
      |                        ^
C:/Program Files/LLVM/include/corecrt.h:98:24: note: previous definition is here
   98 | typedef unsigned short wchar_t;
      |                        ^
1 error generated.