skeeto / w64devkit

Portable C and C++ Development Kit for x64 (and x86) Windows
The Unlicense
2.7k stars 185 forks source link

Reliably targeting CRT's other than msvcrt? #7

Open jonforums opened 3 years ago

jonforums commented 3 years ago

Toying with w64devkit to see if it creates plugin DLL's that work with a 3rd party app that appears to have been compiled with VC++ 2010/11. The following trivial experiment indicates it will avoid these types of problems, but do you recommend something else when using w64devkit other than spec file tweaks to force a CRT link with msvcr{80,90,100,110,120} if truly needed?

Haven't tried UCRT with w64devkit but this and what I see in <INSTALL>/x86_64-w64-mingw32/lib looks promising.

OT...like your blog, this one was great.

$ gcc --version | head -1 && gcc -dumpspecs | grep -C 1 msvc
gcc (GCC) 11.1.0
*libgcc:
%{mthreads:-lmingwthrd} -lmingw32     -lgcc     -lmoldname -lmingwex -lmsvcrt -lkernel32

$ cat msvcr100.spec
*libgcc:
%{mthreads:-lmingwthrd} -lmingw32     -lgcc     -lmoldname -lmingwex -lmsvcr100 -lkernel32

$ gcc -Wall -Wextra -O2 -s -specs=msvcr100.spec -o min.exe min.c
min.c: In function 'main':
min.c:3:14: warning: unused parameter 'argc' [-Wunused-parameter]
    3 | int main(int argc, char *argv[])
      |          ~~~~^~~~
min.c:3:26: warning: unused parameter 'argv' [-Wunused-parameter]
    3 | int main(int argc, char *argv[])
      |                    ~~~~~~^~~~~~

$ ./min.exe
Hello min!

$ objdump -x min.exe | grep -i 'dll name'
        DLL Name: KERNEL32.dll
        DLL Name: msvcr100.dll
skeeto commented 3 years ago

The links you shared cover the core issues: don't share CRT objects, don't allocate/free across CRT boundaries. If you can avoid these problems then it won't matter that your DLL links against the old msvcrt.dll. Though in your case this depends on the API defined by the third party and you might not have a choice.

Beyond this I can't really provide any particular advice. I didn't know it was so relatively easy to link a different CRT, including UCRT. (I learned something new today, thanks!) I've just never needed to do so yet. I tried your example myself, and the resulting DLL seems to be in good order and well behaved.

I successfully used your example to also build a working UCRT-linked DLL (-lucrt). As for UCRT, I still do not really understand the UCRT issues discussed in the article you linked, nor how it's supposed to resolve them. I suppose that means the DLL I just built might break someday?

One word of warning: Be mindful if using pthreads (including OpenMP) as bundled with w64devkit. It specifically requires some version of msvcrt. If you use pthreads and you want to link against UCRT, you'd need to link against both msvcrt and UCRT which… probably isn't a good idea within a single module. Since UCRT support is so new, there are likely other other parts of Mingw-w64 with similar UCRT incompatibilities. For instance, the latest Mingw-w64 release (last week) — which w64devkit won't be using for some time — includes UCRT updates and fixes.

jonforums commented 3 years ago

Thanks for double checking w64devkit's spec file support. It now looks like I don't need to override the CRT as it appears the 3rd party has both legacy and new APIs, and the new API doesn't have cross CRT boundry problems. I'll soon find out.

Similar to this made up snippet, plugin DLL's return data via globals or via heap memory managed by the 3rd party's allocators. Presumably, the 3rd party frees the memory referenced by the struct pointer. If so, looks like my msvcrt version concern just evaporated.

typedef unsigned char BYTE;
typedef char *LPSTR;

struct tagFooStruct
{
    size_t toSize;    // Bytes sent to DLL
    size_t fromSize;  // Bytes returned from DLL
    BYTE  *dataTo;    // Data sent to DLL
    BYTE  *dataFrom;  // Data returned from DLL
};
typedef struct tagFooStruct FOO;
typedef struct tagFooStruct *LPFOO;

// return data via a global buffer
char funcName[256];

// similarly, a metadata func exists that provides the return type of the
// custom "Hello" function
LPSTR GetFuncName(void)
{
    return strcpy(funcName, "Hello");
}

// custom plugin func that, if needed, returns heap data via a struct and uses the
// 3rd party's custom memory allocator to get the memory for the returned data
void Hello(LPFOO foo)
{
    // call 3rd party's custom memalloc and memrealloc as needed
    // and return after updating `foo` struct
    foo->fromSize;
    foo->dataFrom;
}

As for UCRT, I've also not spent much time investigating. After your freestanding Window exe blog, I now try to exclusively use the Win32 API unless I can't avoid it because I need to use other libs. That said, I do find this and this and this interesting.

Seems like the UCRT idea is to always link with an import lib that always references a runtime shim DLL that does the magic to forward to the "correct" symbol in the real impl DLL. If so, it seems as long as (a) the shim DLLs exist, and (b) you don't directly link to the real impl DLL (like many mingw-w64 users are used to), then your UCRT linked DLL likely won't break in the future. But what are the non-obvious corner cases?

This topic is a esoteric one, but can be critical for integrating with older 3rd party "enterprise" apps. And back when I cared about ruby, I recall it being a concern for C-based gems.

Who know's, maybe the topic is interesting enough to one day show up in your ongoing w64devkit blog series 😺

GalaxySnail commented 1 year ago

c17f5ca2b404a1b2ed973a0f8f352a6e0e40cb43 mentioned that:

The Mingw-w64 documentation vaguely mentions a UTF-8 locale, but it is either false or useless depending on the meaning. UCRT has all the same narrow API limitations of MSVCRT — the biggest and thorniest issue with Windows CRTs.

If I understand correctly, ucrt actually works with setlocale(LC_ALL, ".UTF-8") (while msvcrt dosen't) and can be useful for portable programs. For example,

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <assert.h>
#ifdef _WIN32
#  include <windows.h>  // SetConsoleOutputCP
#endif

#ifdef _WIN32
#  define WANT_LOCALE ".UTF-8"
#  define LOCALE_ERROR_MSG "UTF-8 locale is not supported."
#else
#  define WANT_LOCALE ""
#  define LOCALE_ERROR_MSG "failed to setlocale."
#endif

#ifdef _WIN32
int wmain(int argc, wchar_t *argv[])
#else
int main(int argc, char *argv[])
#endif
{
    // set locale to UTF-8
    const char *locale = setlocale(LC_ALL, WANT_LOCALE);
    if (locale == NULL) {
        fprintf(stderr, "ERROR: %s\n", LOCALE_ERROR_MSG);
        return 1;
    }

    if (argc != 2) return 1;

    char *argv1 = NULL;
#ifdef _WIN32
    // convert the cmdline argument from UTF-16 to UTF-8
    // wcstombs works in UTF-8 mode
    size_t length = wcstombs(NULL, argv[1], 0);
    argv1 = malloc(length + 1);
    wcstombs(argv1, argv[1], length + 1);
#else
    argv1 = argv[1];
#endif

#ifdef _WIN32
    SetConsoleOutputCP(CP_UTF8);
#endif

    printf(u8"你好,%s!😊\n", argv1);
    return 0;
}
$ /ucrt64/bin/gcc -municode main.c
$ ./a.exe 😆世界
你好,😆世界!😊
$ ./a.exe 😆世界 | xxd
00000000: e4bd a0e5 a5bd efbc 8cf0 9f98 86e4 b896  ................
00000010: e795 8cef bc81 f09f 988a 0d0a            ............

setlocale on ucrt can be a replacement of manifest.rc in libwinsane, except that we need to use gcc -municode and wmain instead of main to get Unicode arguments. Once argv gets converted to UTF-8, everything should work as fine as libwinsane. In theory, setlocale UTF-8 even supports Windows 7 if ucrt is insatlled on the system.

IMO it would be nice to have some ucrt variants (w64devkit-ucrt, w64devkit-ucrt-mini, and w64devkit-ucrt-fortran), it should be as simple as --with-default-msvcrt=ucrt (like msys2 did).

skeeto commented 1 year ago

Thanks for the example, but that's exactly what I meant by "useless". You essentially wrote two distinct programs interleaved with #ifdef. UCRT has done nothing for you. The setlocale is too late: argv and envp would have already been trashed, and so requires the non-standard entry point and conversion as a work around. If you need to do that, you might as well just deal directly with Win32 yourself. It's the same amount of work, but more portable across Windows variants (older Windows, Wine, etc.), does not depend on any particular CRT behavior, and you can accomplish more, like reading wide console input.

Case in point: Look at u-config, the pkg-config implementation I wrote for w64devkit. It's highly portable across decades of operating systems and compilers — from Windows NT to Linux — including wide paths and arguments, and all without linking a CRT, i.e. it doesn't matter if it's a MSVCRT or UCRT toolchain. Running the following in cmd.exe (as the busybox-w32 shell is still narrow-only) demonstrates printing wide errors, too:

C:\>pkg-config π
pkg-config: could not find package 'π'
C:\>echo >π.pc
C:\>pkg-config --with-path=. π
pkg-config: missing field 'Name' in './π.pc'

It's the same concept: non-standard entry point, convert wide argv/envp to UTF-8, and the bulk of the program is platform agnostic. The difference is segregating entry points into platform layers instead of interleaving them with #ifdef. A UTF-8 locale wouldn't have helped.

I'm glad you noticed my libwinsane hack. It's not something I like to use in real programs, but at least for Windows 10 and up you get all the UTF-8 goodness without any special Windows code in the source, and it works just fine with MSVCRT. That's a lot more useful than UCRT for making plain old C programs behave better on Windows. (It's the behavior UCRT should have had were it designed/implemented properly!)

Finally, there's a cost to changing the locale: It globally (!) changes the behavior of various standard functions in mostly undesirable ways. Few programs want locale-dependent behavior. Many will not work correctly outside the C locale, and it's usually bad for performance (e.g. try GNU sort with and without LC_ALL=C). One of the big issues is strtod parsing floats differently.