onetrueawk / awk

One true awk
Other
1.96k stars 156 forks source link

"cannot ungetc" error will be triggered on some platforms #218

Closed KubaO closed 3 weeks ago

KubaO commented 6 months ago

In b.c, there's a solitary ungetc invoked in a loop, at https://github.com/onetrueawk/awk/blob/master/b.c#L935

In standard C, ungetc is only guaranteed to work once without an intervening seek or read. And thus, if ungetc is needed more than once, it will fail on some platforms. It is invoked in a loop more than once - at least in tests.

At the very least, it fails on Mingw64, in the test T.misc ^RS matches the start of every input file fails.

plan9 commented 6 months ago

thanks for spotting this. we'll revise fnematch again. [and it will go down as the most revised function in OTA]

mpinjr commented 6 months ago

In b.c, there's a solitary ungetc invoked in a loop, at https://github.com/onetrueawk/awk/blob/master/b.c#L935

In standard C, ungetc is only guaranteed to work once without an intervening seek or read.

I remember eyeing that loop suspiciously back in 2012 when I first wrote it, for the very reason you point out. I chose to let it be after testing several implementations (the BSDs, glibc on Linux, and perhaps OSX as well) and finding that they all provided much more than the 1 char guarantee (in some cases, if I recall correctly, unlimited except by available memory).

thanks for spotting this. we'll revise fnematch again. [and it will go down as the most revised function in OTA] “Once more unto the breach, dear friends, once more”

Unless someone else has already begun the work, I'll make it so that our beloved fnematch can split streams on even the most conservative libc. I'm setting up a new dev machine tonight. Give me a few days to get back with a pull request.

Hope you are all well, Miguel

KubaO commented 6 months ago

Thank you so much! For what it’s worth, with very few exceptions I’ve got awk’s test suite to pass when built natively on Windows using gcc-Mingw64, Visual Studio, and clang. I’ve also got the whole thing to work with just a C compiler and cmake as a build system and test driver, since the built-in shell on Windows is lol. No changes to source code needed either. Once ungetc stuff gets fixed, it’ll be truly the most portable awk out there. I’ll contribute the cmake stuff separately - still got work to do on it. It’ll enable continuous integration testing using github actions for both Unix and Windows environments, so hopefully it’ll be of some use :)27. des. 2023 kl. 12:14 pm skrev Miguel Piñeiro Jr. @.***>:

In b.c, there's a solitary ungetc invoked in a loop, at https://github.com/onetrueawk/awk/blob/master/b.c#L935 In standard C, ungetc is only guaranteed to work once without an intervening seek or read.

I remember eyeing that loop suspiciously back in 2012 when I first wrote it, for the very reason you point out. I chose to let it be after testing several implementations (the BSDs, glibc on Linux, and perhaps OSX as well) and finding that they all provided much more than the 1 char guarantee (in some cases, if I recall correctly, unlimited except by available memory).

thanks for spotting this. we'll revise fnematch again. [and it will go down as the most revised function in OTA] “Once more unto the breach, dear friends, once more”

Unless someone else has already begun the work, I'll make it so that our beloved fnematch can split streams on even the most conservative libc. I'm setting up a new dev machine tonight. Give me a few days to get back with a pull request. Hope you are all well, Miguel

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

jleffler commented 6 months ago

You can find a question on Stack Overflow about the number of bytes of pushback for ungetc():

https://stackoverflow.com/questions/7814816/ungetc-number-of-bytes-of-pushback

The most recalcitrant platforms are AIX and HP-UX, both of which limit the pushback to 1 byte; Solaris limits the pushback to 4 characters. Having just checked on Solaris 11.3, HP-UX 11.31 and AIX 7.2, the limits there are unchanged from what they were in 2011. I've just updated my answer to report the (unchanged) limits on these more recent o/s versions.

I don't have data for Windows using MSVC, CygWin or MinGW. If someone does have that information, or can find and provide it, it would be useful to me. I can probably dig out MSVC data, but our work machines do not have CygWin or MinGW installed.

On Wed, Dec 27, 2023 at 11:49 AM Kuba Sunderland-Ober < @.***> wrote:

Thank you so much! For what it’s worth, with very few exceptions I’ve got awk’s test suite to pass when built natively on Windows using gcc-Mingw64, Visual Studio, and clang. I’ve also got the whole thing to work with just a C compiler and cmake as a build system and test driver, since the built-in shell on Windows is lol. No changes to source code needed either. Once ungetc stuff gets fixed, it’ll be truly the most portable awk out there. I’ll contribute the cmake stuff separately - still got work to do on it. It’ll enable continuous integration testing using github actions for both Unix and Windows environments, so hopefully it’ll be of some use :)27. des. 2023 kl. 12:14 pm skrev Miguel Piñeiro Jr. @.***>:

In b.c, there's a solitary ungetc invoked in a loop, at https://github.com/onetrueawk/awk/blob/master/b.c#L935 In standard C, ungetc is only guaranteed to work once without an intervening seek or read.

I remember eyeing that loop suspiciously back in 2012 when I first wrote it, for the very reason you point out. I chose to let it be after testing several implementations (the BSDs, glibc on Linux, and perhaps OSX as well) and finding that they all provided much more than the 1 char guarantee (in some cases, if I recall correctly, unlimited except by available memory).

thanks for spotting this. we'll revise fnematch again. [and it will go down as the most revised function in OTA] “Once more unto the breach, dear friends, once more”

Unless someone else has already begun the work, I'll make it so that our beloved fnematch can split streams on even the most conservative libc. I'm setting up a new dev machine tonight. Give me a few days to get back with a pull request. Hope you are all well, Miguel

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/onetrueawk/awk/issues/218#issuecomment-1870547725, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCAHBUTDCQN33AG3GN7I23YLRUSBAVCNFSM6AAAAABBCUAGZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZQGU2DONZSGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Jonathan Leffler @.***> #include Guardian of DBD::Informix - v2018.1031 - http://dbi.perl.org "Blessed are we who can laugh at ourselves, for we shall never cease to be amused."

plan9 commented 6 months ago

thank you for the updates. @KubaO we are not planning any changes to the existing make structure at this point. @mpinjr i'm open to any changes that are straight-forward and minimalist. thanks

KubaO commented 6 months ago

That’s ok of course. I’ll put my stuff in a fork and sync it with upstream. It’s a nice code base - tiny for what it does!- Kuba27. des. 2023 kl. 8:05 pm skrev ozan s. yigit @.***>: thank you for the updates. @KubaO we are not planning any changes to the existing make structure at this point. @mpinjr i'm open to any changes that are straight-forward and minimalist. thanks

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

wajap commented 6 months ago

Compiled program from

https://stackoverflow.com/questions/7814816/ungetc-number-of-bytes-of-pushback

as native 64-bit windows program using several compilers (MSVC, mingw-w64 and tcc[tint c compiler]) all gave Error at count = 1. Native windows program means that it uses the dll's already present in windows.

Compiling with Cygwin or MSYS2 gives a program that uses a special dll to as a compatibility layer for the POSIX API. Those give No error up to count = 4095

The failure on T.misc (due to unable to ungetc 'a' ) using awk compiled with mingw-w64 did not happen on earlier versions (20231116 commit 9e254e5 was the last one). Maybe this can be a help to solve the problem.

KubaO commented 6 months ago

In any case, the only portable assumption is of length=1. The exact platform details matter not, the C standard does… At least as far as I understand it, the OTG is meant to be portable C + minimum of posix needed to do the job. 2. jan. 2024 kl. 5:15 am skrev Wilbert van der Poel @.***>: Compiled program from

https://stackoverflow.com/questions/7814816/ungetc-number-of-bytes-of-pushback as native 64-bit windows program using several compilers (MSVC, mingw-w64 and tcc[tint c compiler]) all gave Error at count = 1. Native windows program means that it uses the dll's already present in windows. Compiling with Cygwin or MSYS2 gives a program that uses a special dll to as a compatibility layer for the POSIX API. Those give No error up to count = 4095 The failure on T.misc (due to unable to ungetc 'a' ) using awk compiled with mingw-w64 did not happen on earlier versions (20231116 was the last one). Maybe this can be a help to solve the problem.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

plan9 commented 3 weeks ago

this issue will remain unresolved.