raygard / wak

wak -- an awk implementation for toybox and standalone
BSD Zero Clause License
52 stars 2 forks source link

Versioned release #7

Open absolutelynothinghere opened 2 months ago

absolutelynothinghere commented 2 months ago

Are there plans to make a versioned (numbered) release of wak?

Thank you.

raygard commented 2 months ago

I had not thought about that yet. Do you think it is mature enough for that yet? Are you using it?

absolutelynothinghere commented 2 months ago

Are you using it?

It seems to satisfy autotools so I suppose yes, but I haven't tested wak beyond that. The busybox test suite might come in handy.

Do you think it is mature enough for that yet?

Only you can decide when it is ready, I was simply making a suggestion for the future.

davidar commented 2 months ago

I'm also using it to keep various build scripts (including autotools) happy. It works quite well with tcc too, I've just been putting

#!/usr/local/bin/tcc -run

at the top of the monolithic source file and running it like a script

oliverkwebb commented 2 months ago

I'm also using it to keep various build scripts (including autotools) happy. It works quite well with tcc too, I've just been putting

#!/usr/local/bin/tcc -run

at the top of the monolithic source file and running it like a script

If you don't mind sharing, what environment (libc version, tcc version, operating system) are you running? I took a shot at running it like a script and got a error:

$ ./monosrc/mono.c
In file included from ./monosrc/mono.c:24:
/usr/include/regex.h:682: error: '__nmatch' undeclared

That line of /usr/include/regex.h on my system (glibc 2.39-2 (x86_64) on Arch Linux) is:

extern int regexec (const regex_t *_Restrict_ __preg,
        const char *_Restrict_ __String, size_t __nmatch,
        regmatch_t __pmatch[_Restrict_arr_
          _REGEX_NELTS (__nmatch)],
         int __eflags);

Which evaluates to (when removing the irrelevant "const" and "restrict" stuff):

extern int regexec (regex_t *__preg, char *__String, size_t __nmatch, regmatch_t __pmatch[__nmatch], int __eflags)

The problem seems to be:

$ tcc -run -
extern int a(int b, char c[b]);
<stdin>:1: error: 'b' undeclared

Pointing it at my musl headers gives a different error in a typecast for bits/alltypes.h

My version of tcc might also just be old and someone patched in support for this in a different version. Don't know

davidar commented 2 months ago

I'm using a recent build of the mob branch of https://repo.or.cz/w/tinycc.git. I've used glibc 2.35 on ubuntu 22.04, as well as my own fork of musl (https://github.com/davidar/musl/tree/tcc) which just has a handful of changes allowing musl itself to be built with tcc. If you're using a versioned release of tcc then it's probably missing quite a lot of changes as the last one was 6 years ago.

$ tcc -v
tcc version 0.9.28rc 2024-04-09 mob@6b3cfdd0 (x86_64 Linux)
$ tcc -run -
extern int a(int b, char c[b]);
main(){puts("ok!");}
-:1: warning: implicit declaration of function 'puts'
ok!
oliverkwebb commented 1 month ago

I'm using a recent build of the mob branch of https://repo.or.cz/w/tinycc.git.

I didn't know tcc was still being updated. Thanks!

My version of tcc was "tcc version 0.9.27 (x86_64 Linux)" which has a 2000-2006 Copyright notice (It's The latest version the Arch repos provide).

Using the version you provide, I'm not just able to compile wak, but with some minimal changes (#define QUIET in lib/portability.h (Because it only adds it if it's using gcc) and removing the LDOPTIMIZE flags tcc doesn't have) compile the toybox 5.5x faster than with gcc (28s -> 5s). Along with stuff like mksh, nawk, sbase, and many other programs.

Like I said, thanks!

davidar commented 1 month ago

@oliverkwebb No worries! I've been using tcc (and wak) for https://github.com/davidar/bootsh, it's quite impressive how much software can be compiled with it these days, and indeed it is quite fast. Having the one-file implementation of wak has been quite useful for bootstrapping it separately without bloating the main bootsh binary, so I hope that wak will remain a standalone project in addition to the option of integrating it with toybox. It's been quite indispensable too, given that so many C build systems rely on awk in one way or another, so thanks for your work on this compact and portable implementation :)

raygard commented 1 month ago

@davidar David, you're welcome! I'm glad someone is using wak to good purpose. Took a while to write it (understatement there), and I don't know if it will be accepted into toybox; I wrote it sort of on spec. Let me know if you find bugs (it has some for sure) or have ideas for enhancements. Trying to keep it small, but will consider it. I do intend to fix any real bugs but will probably not be doing much more unless/until Rob Landley gets it into toybox.

I tried tcc and it's amazingly fast; execution a bit slow but that's OK for what I'd want it for. Questions: I tried the #!/usr/local/bin/tcc -run to run it as a script and that worked fine. When I tried compiling, it didn't find the math functions until I added -lm to the tcc mono.c command line. Why does the -run thing work without -lm?

davidar commented 1 month ago

@raygard Generally speaking it's been working pretty well, but one issue that I've run into is with "interactive" awk scripts. For instance:

$ mawk '/^say / { print $2 }'
say hello
hello
say goodbye
goodbye
^D
$ ./wak.c '/^say / { print $2 }'
say hello
say goodbye
^D
hello
goodbye
^D

It seems like it's not line-buffering the input, though I'm slightly confused why that is as, looking at the code, it seems like it should be (at least in get_char, not sure about getrec_ functions). I can kind of get it to work by pressing ctrl-D after each line of input:

$ ./wak.c '/^say / { print $2 }'
say hello
^D
hello
say goodbye
^D
goodbye
^D

mawk behaves similarly in some circumstances (https://github.com/ThomasDickey/original-mawk/issues/41#issuecomment-241070898) because they're block buffering input, is wak doing something similar?

EDIT: I figured out how to make this work (https://github.com/davidar/bootsh/commit/acee3572e83f4b008db0f7b3ab705b3306525101), I can submit a PR for it if you like

Why does the -run thing work without -lm?

Good question, initially I assumed it was just automatically adding common libraries for convenience, but that doesn't appear to be the case. I think it's actually just because -run executes the program in the same process as tcc, which itself is already linked to libm, so it's visible without needing to explicitly load it.