microsoft / STL

MSVC's implementation of the C++ Standard Library.
Other
10.25k stars 1.51k forks source link

Visual Studio 2022 `std::system_category` returns "unknown error" if system locale is not en-US #3254

Open cppdev123 opened 2 years ago

cppdev123 commented 2 years ago

I download visual studio 2022 17.4.1 and was surprised to see that "unknown error" is returned from std::error_code using std::system_categoey() as the error category while it was reporting the correct error message before. After searching on the internet, I found someone complaining about the same issue here : "unknown error" from std::error_code on Windows The solution was to set the system locale to en-US from "administrative language settings".

I tested it on two windows installations: Windows 11 with locale set to "en-GB" and worked well when changed to "en-US" Windows 10 with locale set to "ar-EG" and worked well when changed to "en-US"

I have also visual studio 2019 installed and the binary produced by it works as expected without needing to change the system locale

The issue seems to have appeared in __std_system_error_allocate_message in syserror_import_lib.cpp after this commit : explicitly pass the system locale ID for system_categ…

To mention also: both windows installs have "English (United States)" as the "Windows display language" but had system locale different from "en-US"

EDIT: I set both display language and system locale to "ar-EG" but still "unknown error" is returned. Only when system locale is set to en-US the correct message is returned disregarding the display language

Also tracked by internal VSO-1735227 / AB#1735227 .

cppdev123 commented 2 years ago

This is what happens:

_NODISCARD size_t __CLRCALL_PURE_OR_STDCALL __std_system_error_allocate_message(
    const unsigned long _Message_id, char** const _Ptr_str) noexcept {
    // convert to name of Windows error, return 0 for failure, otherwise return number of chars in buffer
    // __std_system_error_deallocate_message should be called even if 0 is returned
    // pre: *_Ptr_str == nullptr
    DWORD _Lang_id;
    /* GetLocaleInfoEx succeeds and sets _Lang_id to 3073 if system locale is ar-EG or 1033 if en-US */
    const int _Ret = GetLocaleInfoEx(LOCALE_NAME_SYSTEM_DEFAULT, LOCALE_ILANGUAGE | LOCALE_RETURN_NUMBER,
        reinterpret_cast<LPWSTR>(&_Lang_id), sizeof(_Lang_id) / sizeof(wchar_t));
    if (_Ret == 0) {
        _Lang_id = 0;
    }

    /* FormatMessageA fails if id is not 1033 because if language id is not 0 it will only try to load  error message in this language and my windows installations seems not to have these strings in arabic or even in en-GB! */
    const unsigned long _Chars =
        FormatMessageA(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
            nullptr, _Message_id, _Lang_id, reinterpret_cast<char*>(_Ptr_str), 0, nullptr);

    return _CSTD __std_get_string_size_without_trailing_whitespace(*_Ptr_str, _Chars);
}

from FormatMessageA doc:

If you pass a specific LANGID in this parameter, FormatMessage will return a message for that LANGID only. If the function cannot find a message for that LANGID, it sets Last-Error to ERROR_RESOURCE_LANG_NOT_FOUND. If you pass in zero, FormatMessage looks for a message for LANGIDs in the following order:

Language neutral Thread LANGID, based on the thread's locale value User default LANGID, based on the user's default locale value System default LANGID, based on the system default locale value US English If FormatMessage does not locate a message for any of the preceding LANGIDs, it returns any language message string that is present. If that fails, it returns ERROR_RESOURCE_LANG_NOT_FOUND.

So if language id was 0 as before the mentioned commit, I'll get at least a message in US English if I don't have the messages in my system locale language and this is why it was working before.

Now FormatMessageA will try to only load load error messages in my system locale "ar-EG" which I don't have thus system category will give me "unknown error"

This is the second time I'm hit by locale issues. Please test locale related code on different locales other than "en-US" before shipping code!

fsb4000 commented 2 years ago

This is the second time I'm hit by locale issues. Please test locale related code on different locales other than "en-US" before shipping code!

We do. I think at least "ru-Ru" and "zh-CN" are tested.

fsb4000 commented 2 years ago

FormatMessageA fails if id is not 1033 because if language id is not 0 it will only try to load error message in this language and my windows installations seems not to have these strings in arabic or even in en-GB!

#include <clocale>
#include <iostream>
#include <string>

#include <Windows.h>

int main() {
    setlocale(LC_ALL, "");
    auto file_handle(CreateFile(
        L"D:\\non_exist_file.txt", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 
        FILE_ATTRIBUTE_NORMAL, NULL));
    if (file_handle == INVALID_HANDLE_VALUE) {
        const auto error_code = GetLastError();
        std::string errmsg = std::system_category().message(error_code);
        std::cout << errmsg << '\n';
    }
}

изображение

#include <clocale>
#include <windows.h>
#include <system_error>
#include <stdio.h>

int main()
{
    setlocale(LC_ALL, "");
    std::error_code ec(ERROR_DIR_NOT_EMPTY, std::system_category());
    printf("message = '%s' (%lld)\n", ec.message().c_str(), static_cast<long long>(_MSC_FULL_VER));
}

изображение

I will investigate what to do if windows installations do not have error messages in the selected language.

cppdev123 commented 2 years ago

The error set by FormatMessageA is 15100 (ERROR_MUI_FILE_NOT_FOUND) although I installed the Arabic language pack, and I can set Arabic as the display language for windows and windows apps including explorer shows proper support for Arabic. Maybe microsoft didn't publish the MUI file containing error messages in Arabic?

Can you fallback to language id 0 if retried with language id other than 0 and failed with ERROR_MUI_FILE_NOT_FOUND or ERROR_RESOURCE_LANG_NOT_FOUND ?

dovdiienko-el3 commented 1 year ago

On my computer the GetLocaleInfoEx() function returns 1049 (Russian). The interface language is English.

Because the interface language is English, I expect std::error_code(996, std::system_category()).message() function to return a message in English. In fact it tries to return in Russian. Because the language files for the Russian locale are not installed, the FormatMessageA function returns "unknown message".

fsb4000 commented 1 year ago

Because the interface language is English, I expect std::error_code(996, std::system_category()).message() function to return a message in English.

In https://github.com/microsoft/STL/issues/2451 , the user expected that if his interface language is Chinese and system default locale is English the message should be returned in English.

And it seems we can't even use English always: https://github.com/microsoft/STL/pull/3260#discussion_r1038916530

dovdiienko-el3 commented 1 year ago

@fsb4000 , in my case:

dovdiienko-el3 commented 1 year ago

I know it is not back compatible, but I believe during the application initialization on startup the C++ locale must be set to the OS User Language.

Later std::error_code::message should return the message based on C++ locale rather than Windows Settings, as locale can be modified by application.

trueqbit commented 8 months ago

I have the same problem, with English (Australian) and other English locales.

TBH I expect FormatMessage() to always succeed when specifying the system locale, especially so when the proper "language pack" is installed. The error seems to lie in the fact that certain language environments do not have their own KernelBase.dll.mui, FormatMessage() should take care of this in any case.

@dovdiienko-el3 I don't think the C++ locale or specifically std::system_category::message() should return a message in the user's locale. After all, it is a system error. Also, some Windows APIs, e.g. for services, work with the system locale, e.g. for string comparison. So again, I expect retrieving the message for a system error using the system locale to just work.

@StephanTLavavej Can you escalate the problem to the Windows SDK team on what to do?

davemcincork commented 8 months ago

@StephanTLavavej - This is a big issue for any developer whose system locale isn't set to en-US. It was definitely introduced in the VS2022 (v143) platform toolset and doesn't exist in its predecessors. I've done a lot below to help you repro and even fix. I'm sure everyone in this thread would really appreciate if you could bring this to the attention of the relevant team.

This min repro program will produce different output with different platform toolsets:

#include <system_error>
#include <cstdio>

int main()
{
    std::printf("Oops: %s\n", std::system_category().message(2).c_str());
}

I've avoided including Windows.h here for brevity but error 2 is ERROR_FILE_NOT_FOUND. My display language is en-IE and my system locale is en-GB. If the above is built in VS2022 but using either the VS2017 (v141) or VS2019 (v142) platform toolset, it produces the following output as expected:

Oops: The system cannot find the file specified.

But if built using the VS2022 (v143) platform toolset it produces the following output:

Oops: unknown error

The problem seems to be in your internal function __std_system_error_allocate_message which is defined in a source file named syserror_import_lib.cpp. Here's an x64 disassembly of the implementation in VS2017/2019 up to the point it calls FormatMessageA:

00007FF6304B5820  mov         qword ptr [rsp+10h],rdx  
00007FF6304B5825  mov         dword ptr [rsp+8],ecx  
00007FF6304B5829  sub         rsp,58h  
00007FF6304B582D  mov         qword ptr [rsp+30h],0  
00007FF6304B5836  mov         dword ptr [rsp+28h],0  
00007FF6304B583E  mov         rax,qword ptr [_Ptr_str]  
00007FF6304B5843  mov         qword ptr [rsp+20h],rax  
00007FF6304B5848  xor         r9d,r9d  
00007FF6304B584B  mov         r8d,dword ptr [_Message_id]  
00007FF6304B5850  xor         edx,edx  
00007FF6304B5852  mov         ecx,1300h  
00007FF6304B5857  call        qword ptr [__imp_FormatMessageA (07FF6304C7008h)]

Note the xor r9d, r9d instruction which has the effect of passing a hard-coded LANGID of 0 to FormatMessageA. Compare the above to the supposedly locale-aware implementation in VS2022:

00007FF7F14455F0  mov         qword ptr [rsp+10h],rdx  
00007FF7F14455F5  mov         dword ptr [rsp+8],ecx  
00007FF7F14455F9  sub         rsp,58h  
00007FF7F14455FD  mov         r9d,2  
00007FF7F1445603  lea         r8,[_Lang_id]  
00007FF7F1445608  mov         edx,20000001h  
00007FF7F144560D  lea         rcx,[string L"!x-sys-default-\x4000\0\0\0\0"... (07FF7F144F578h)]  
00007FF7F1445614  call        qword ptr [__imp_GetLocaleInfoEx (07FF7F1457010h)]  
00007FF7F144561A  mov         dword ptr [_Ret],eax  
00007FF7F144561E  cmp         dword ptr [_Ret],0  
00007FF7F1445623  jne         __std_system_error_allocate_message+3Dh (07FF7F144562Dh)  
00007FF7F1445625  mov         dword ptr [_Lang_id],0  
00007FF7F144562D  mov         qword ptr [rsp+30h],0  
00007FF7F1445636  mov         dword ptr [rsp+28h],0  
00007FF7F144563E  mov         rax,qword ptr [_Ptr_str]  
00007FF7F1445643  mov         qword ptr [rsp+20h],rax  
00007FF7F1445648  mov         r9d,dword ptr [_Lang_id]  
00007FF7F144564D  mov         r8d,dword ptr [_Message_id]  
00007FF7F1445652  xor         edx,edx  
00007FF7F1445654  mov         ecx,1300h  
00007FF7F1445659  call        qword ptr [__imp_FormatMessageA (07FF7F1457008h)]

Instead of using a hard-coded LANGID of 0, this is getting the LANGID from the system locale by calling GetLocaleInfoEx. This is correctly returning 2057 (0x0809 -> en-GB) on my machine, but FormatMessageA is failing to map this to a message.

I think the approach here should be to try progressively more general LANGIDs until a mapping is found. In other words, start with the full LANGID which will consist of a specific primary language and sub-language. If that doesn't find a mapping, drop back to trying the primary language and a default sub-language. If that fails, drop back to primary language and neutral sub-language. Finally, if that fails, drop back to neutral primary language and neutral sub-language, i.e. 0.

FWIW, this is how I'm doing it:

template <class CharT>
constexpr auto GetFormatMessageFunction()
{
    if constexpr (std::is_same_v<CharT, char>)
    {
        return &FormatMessageA;
    }
    else
    {
        return &FormatMessageW;
    }
}

template <class CharT = char>
std::basic_string<CharT> GetSystemErrorMesssage(const DWORD errorCode)
{
    const auto formatMessageFunction = GetFormatMessageFunction<CharT>();

    DWORD langId;
    if (0 == GetLocaleInfoEx(LOCALE_NAME_SYSTEM_DEFAULT,
                             LOCALE_ILANGUAGE |
                             LOCALE_RETURN_NUMBER,
                             reinterpret_cast<LPWSTR>(&langId),
                             sizeof(langId) / sizeof(wchar_t)))
    {
        langId = MAKELANGID(LANG_NEUTRAL, SUBLANG_NEUTRAL);
    }

    std::basic_string<CharT> result;
    for (;;)
    {
        CharT* msg;
        const auto msgLen = formatMessageFunction(FORMAT_MESSAGE_ALLOCATE_BUFFER |
                                                  FORMAT_MESSAGE_FROM_SYSTEM |
                                                  FORMAT_MESSAGE_IGNORE_INSERTS,
                                                  nullptr,
                                                  errorCode,
                                                  langId,
                                                  reinterpret_cast<CharT*>(&msg),
                                                  0,
                                                  nullptr);
        if (msgLen != 0)
        {
            try
            {
                result.assign(msg, msgLen);
            }
            catch (...)
            {
                LocalFree(msg);
                throw;
            }
            LocalFree(msg);
            break;
        }
        const auto primaryLangId = PRIMARYLANGID(langId);
        if (primaryLangId == LANG_NEUTRAL)
        {
            break;
        }
        const auto subLangId = SUBLANGID(langId);
        if (subLangId == SUBLANG_NEUTRAL)
        {
            langId = MAKELANGID(LANG_NEUTRAL, SUBLANG_NEUTRAL);
        }
        else if (subLangId == SUBLANG_DEFAULT)
        {
            langId = MAKELANGID(primaryLangId, SUBLANG_NEUTRAL);
        }
        else
        {
            langId = MAKELANGID(primaryLangId, SUBLANG_DEFAULT);
        }
    }
    for (auto charIndex = result.size() - 1; charIndex != MAXSIZE_T; charIndex--)
    {
        if (result[charIndex] != '\n' && result[charIndex] != '\r')
        {
            break;
        }
        result.resize(charIndex);
    }
    return result;
}
ujos commented 8 months ago

It is hard to believe it is true. This kind of problem must be treated as a show stopper from my point of view and has to be fixed ASAP. In fact 1.5 years passed since this issue had been reported and it is still actual.

This issue prevents developers from using the std::system_category() in their code. Instead they have to implement their own for Windows.

b-spencer commented 7 months ago

I've read through #3260 and FWIW, it seems like it will fix the problem.

Do I understand correctly that the workarounds for this issue are:

  1. Set the system (not even user) language ID to one that corresponds to an installed set of language error strings. (Is it even possible to install error strings for languages like "en-CA"?).
  2. Use a version of the toolchain prior to the one that comes with VS 17.3.0.
  3. Don't use std::error_code::message() or related functionality.

None of those seem very practical. Is there anything else I can do before #3260 becomes generally available? Thanks.

davemcincork commented 6 months ago

None of those seem very practical. Is there anything else I can do before #3260 becomes generally available? Thanks.

There is. Just define a custom implementation of std::error_category like this:

class LocaleSafeSystemCategory : public std::error_category
{
    const char* name() const noexcept override
    {
        return "locale-safe system error category";
    }

    std::string message(const int errorValue) const override
    {
        // The implementation of this would be like the GetSystemErrorMesssage
        // function I shared above. It might be better move this to a .cpp file
        // because it avoids having a Win32 dependency in anything that includes
        // the header defining this class.
    }
};

Then, wherever you would normally do something like this:

throw std::system_error(GetLastError(),
                        std::system_category(),
                        "CreateFileA");

You would instead in theory do this:

throw std::system_error(GetLastError(),
                        LocaleSafeSystemCategory{},
                        "CreateFileA");

I say in theory because, as is common in many codebases, I tend to never directly do a throw std::system_error in my code. I instead have a little family of inter-related utility macros which do some value add and improve legibility. If you have something similar this means you can work around the locale issue without having to make any changes at the call sites. My helper macros are defined as shown below. Note that it only uses the custom std::error_category if being compiled with VS2022. Whenever this bug is fixed, I will adjust the #if test accordingly:

#define THROW_SYSTEM_ERROR(name, value, category) \
    do \
    { \
        throw std::system_error(value, \
                                category, \
                                #name "() failed in " __FUNCTION__ "()"); \
    } while (false)

#if _MSC_VER >= 1930
#   define THROW_WIN32_ERROR(name, win32ErrorValue) \
        THROW_SYSTEM_ERROR(name, win32ErrorValue, LocaleSafeSystemCategory{})
#else
#   define THROW_WIN32_ERROR(name, win32ErrorValue) \
        THROW_SYSTEM_ERROR(name, win32ErrorValue, std::system_category())
#endif

#define THROW_LAST_WIN32_ERROR(name) \
    THROW_WIN32_ERROR(name, GetLastError())

#define THROW_POSIX_ERROR(name, posixErrorValue) \
    THROW_SYSTEM_ERROR(name, posixErrorValue, std::generic_category())

#define THROW_LAST_POSIX_ERROR(name) \
    THROW_POSIX_ERROR(name, errno)
b-spencer commented 6 months ago

There is. Just define a custom implementation of std::error_category like this:

That's a good idea. Thanks for clearly explaining it!

Unfortunately, in my case, I don't already have an existing customization point like your macro, and I'd have a lot of code to audit.

What I did discover, though, is that if you define implementations of all three of the __std_system_error_* functions together in your own .obj file (even inside your own static library), then MSVC's linker seems to be happy to silently use your implementation instead of those in msvcprt.lib (or msvcprtd.lib for debug builds). Presumably, it finds that first and then ignores the duplicate object file in the toolchain's static library. So far, this seems to "just work", without any /force flags or warnings on the link, and some searching around reveals that this is long-standing behaviour of the toolchain. Imagine my surprise.