skeeto / w64devkit

Portable C and C++ Development Kit for x64 (and x86) Windows
The Unlicense
2.66k stars 185 forks source link

does w64devkit include a regex header/lib ? With C. #127

Closed ggbbrr closed 3 weeks ago

ggbbrr commented 1 month ago

I'm trying to compile a C program with regex. I get "fatal error: regex.h: No such file or directory".

skeeto commented 1 month ago

That's a POSIX API: https://pubs.opengroup.org/onlinepubs/7908799/xsh/regex.h.html

w64devkit is for native Windows development, so it doesn't have POSIX compatibility beyond a few small odds and ends in Mingw-w64. You'll need to find, or write, a POSIX regex implementation which supports Windows.

PCRE includes a "pcre2posix" library implementing the POSIX API. I built it from source in w64devkit just now, and it wasn't terribly difficult to get up and running. I wrote a "regex.h" with one line:

include

After which POSIX regex.h sample programs worked without changes. Caveat: if you're using the static library (default build), use --static with pkg-config so that it picks up libpcre2-8:

pkg-config --static --libs libpcre2-posix

Otherwise be sure to link libpcre2-8 last, because order matters.

A more laborious option is to wrap the regex engine in libstdc++, included in w64devkit, to present the C POSIX API, even if with with some regex incompatibilities.

https://en.cppreference.com/w/cpp/regex

(That could be an interesting project on its own.)

skeeto commented 1 month ago

Since it sounded like an interesting challenge, I just added a custom PCRE build to contrib/ that provides a POSIX regex library, and only that exact functionality, via the POSIX names and regex.h. You can't even tell it's PCRE behind the scenes. None of this will be distributed in w64devkit, and it's still up to you to build+install the library via the script.

This is the second such that I've made, and I'm planning to collect more in contrib/ in the future.

avih commented 1 month ago

This is the second such that I've made, and I'm planning to collect more in contrib/ in the future.

Not sure if you mean more libraries in general, or more posix regex specifically, but if the latter, then there's a much smaller posix compliant (presumably) regex package, which is Henry Spencer's POSIX BSD implementation, originally in 4.4BSD.

This is the latest clean code I could find - https://github.com/garyhouston/regex which includes a single bugfix. to build it in w64devkit: make lib (compiles about 5 files) and then add to the generated regex.h: #include <stdlib.h> and #include <sys/types.h>.

libregex.a is about 50k stripped, and only supports ASCII locale (general UTF-8 works except for Unicode collating etc).

The same code, with some build convenience and few compiler warnings eliminated (and pre-generated regex.h) but without real code changes, is here: https://github.com/garyhouston/rxspencer . to build it without cmake:

cc -c -DPOSIX_MISTAKE reg*.c
ar rcs libregex.a reg*.o

This library is also part or NetBSD's pkgsrc as librxspencer, but I couldn't find its source in the pkgsrc files (the info link seem to point to the github rxspencer project owner webpage).

I think both could be great as a vendored regex implementation, though not sure which I prefer yet (the latter has a lot of small compiler warning fixes - mostly casts, and I didn't look at them closely).

A slightly older version of this code (and nearly identical) is in the sources of 4.4BSD.

An evolution of this code is at current FreeBSD (and I'm guessing other BSD's) lib/libc/regex, and I did manage to compile it from FreeBSD 13.3 after some effort, but it did not work correctly (and uses wint_t char types etc, not the simple ASCII which the original older code uses).

So FYI if you want this in addition, or maybe iunstead of, the PCRE thingy.

skeeto commented 1 month ago

Thanks for the tip, @avih. I didn't know about this library. The generated header stuff is unfortunate, but the librxspencer fork addresses that, as well as general cleanup. If I were going to use or distribute it, I'd make a few more, small tweaks (fix the to{lower,upper} UB, change strcpy into memcpy, include guard in regex2.h, delete some unnecessary code, and a unity build), but that's probably it. Good find!

Not sure if you mean more libraries in general

The other one is a garbage collector, libgc.c (a stripped down Boehm GC), so in general. Were I to use these libraries in a serious project, here's how I'd handle them, depending on the subset of features I want to use. If I take time to figure it out, I can capture it in contrib/.

There's another in 6f1f2b9b with Expat, where I'm considering skipping its build system when building w64devkit. The complete library is a mere three translation units, so a 25KLOC build system — substantially more code than the library itself — is rather silly, especially since most of it is waste (checking if size_t is defined, etc.). I'm even tempted to amalgamate it and include it in this repository, making it one less dependency to fetch, though that complicates the repository's licensing.

There's yet another here, though not exactly a library, substantial enough to warrant its own repository. It builds and installs an "f77" compiler without needing gfortran:

https://github.com/skeeto/f2c-w64devkit

My philosophy is that binary distributions should have ownership of their dependencies, either directly or through contract. If a dependency has an issue that requires attention, it must be possible to fix without waiting for unpaid, upstream volunteers to do it. Language-based package managers have no hooks to patch packages, placing you at the mercy of the upstream internet randos that maintain it. Linux distributions mostly have it right for themselves, where they start from an upstream source artifact, then maintain a local patchset on top, a kind of light fork, that integrates it into the distribution, independent of the upstream project.

That's why I'm uninterested in distributing libraries in w64devkit, aside from the essential toolchain runtime bits. (IMHO, Mingw-w64 goes too far when it includes POSIX functionality not in MSVCRT, but I'm not going to rip that out.) That even includes POSIX regular expressions.

avih commented 1 month ago

That's why I'm uninterested in distributing libraries in w64devkit, aside from the essential toolchain runtime bits.

Yeah, I don't disagree.

Specifically about regex though, since you already have a PCRE based thingy, I thought you might be interested in something much smaller which provides possibly good enough functionality.

I think I'll use it (the spencer one) if I need regex, though TBH I don't need it often or really at all, but having a small windows-compatible POSIX regex lib is just too good to ignore IMHO.

FWIW, tcl used (uses?) an earlier version of the spencer V8 regex code as its regex engine, and so does less as fallback when system regex is unavailable - e.g. on Windows, though I've not yet seen projects which vendor the spencer BSD POSIX code, and I've always wanted to have POSIX regex API for windows...

avih commented 1 month ago

Off topic for this thread, but in general, autoconf (and I guess libtool) would be nice to have, especially to build from a git repo rather than a source tarball, but I don't know enough on autotools to begin with (other than using it to generate configure).