hercules-390 / hyperion

Hercules 390
Other
246 stars 69 forks source link

Make hyperion compile on NetBSD and with fewer warnings. #245

Open Rhialto opened 6 years ago

Rhialto commented 6 years ago

These changes will allow hyperion to build on NetBSD. I have not seriously tested the executable yet but it starts. The last of the changes is an attempt to get rid of one of the many many many warning messages that gcc 4.8.4 produces. Because of them you can't find any serious issues, so I got rid of one class (but there are more, the one that seems most common now are printf format warnings).

ghost commented 6 years ago

the netbsd man page shows that here is no need or the capitalisation of the functions in ctype.h the pull request should be rejected

CTYPE(3) NetBSD Library Functions Manual CTYPE(3)

NAME isalpha, isupper, islower, isdigit, isxdigit, isalnum, isspace, ispunct, isprint, isgraph, iscntrl, isblank, toupper, tolower, -- character clas- sification and mapping functions

LIBRARY Standard C Library (libc, -lc)

SYNOPSIS

include

 isalpha(int c);

 isupper(int c);

 islower(int c);

 isdigit(int c);

 isxdigit(int c);

 isalnum(int c);

 isspace(int c);

 ispunct(int c);

 isprint(int c);

 isgraph(int c);

 iscntrl(int c);

 isblank(int c);

 toupper(int c);

 tolower(int c);

DESCRIPTION The above functions perform character tests and conversions on the inte- ger c.

 See the specific manual pages for information about the test or conver-
 sion performed by each function.

EXAMPLES To print an upper-case version of a string to stdout, the following code can be used:

       const char *s = "xyz";

       while (*s != '\0') {
           putchar(toupper((int)(unsigned char)*s));
           s++;
       }

SEE ALSO isalnum(3), isalpha(3), isblank(3), iscntrl(3), isdigit(3), isgraph(3), islower(3), isprint(3), ispunct(3), isspace(3), isupper(3), isxdigit(3), tolower(3), toupper(3), ascii(7)

STANDARDS These functions, with the exception of isblank(), conform to ANSI X3.159-1989 (ANSI C89''). All described functions, including isblank(), also conform to IEEE Std 1003.1-2001 (POSIX.1'').

CAVEATS The first argument of these functions is of type int, but only a very restricted subset of values are actually valid. The argument must either be the value of the macro EOF (which has a negative value), or must be a non-negative value within the range representable as unsigned char. Passing invalid values leads to undefined behavior.

 Values of type int that were returned by getc(3), fgetc(3), and similar
 functions or macros are already in the correct range, and may be safely
 passed to these ctype functions without any casts.

 Values of type char or signed char must first be cast to unsigned char,
 to ensure that the values are within the correct range.  The result
 should then be cast to int to avoid warnings from some compilers.  Cast-
 ing a negative-valued char or signed char directly to int will produce a
 negative-valued int, which will be outside the range of allowed values
 (unless it happens to be equal to EOF, but even that would not give the
 desired result).

NetBSD 7.1 May 6, 2010 NetBSD 7.1

Rhialto commented 6 years ago

I know the NetBSD man page, thank you very much. Of course I consulted it while preparing this patch. Hence the quote from it.

You missed the part where I #define these names. One may quibble about where to put the defines, or even if exactly these names should be used, but the casts are required to avoid dozens if not hundreds of cases of undefined behaviour.

diff --git a/hstdinc.h b/hstdinc.h
index 0fe2556b..3d7298d8 100644
--- a/hstdinc.h
+++ b/hstdinc.h
@@ -92,6 +92,17 @@
 #include <string.h>
 #include <setjmp.h>
 #include <ctype.h>
+#define Isspace(c)     isspace((int)(unsigned char)(c))
+#define Isprint(c)     isprint((int)(unsigned char)(c))
+#define Isdigit(c)     isdigit((int)(unsigned char)(c))
+#define Isxdigit(c)    isxdigit((int)(unsigned char)(c))
+#define Iscntrl(c)     iscntrl((int)(unsigned char)(c))
+#define Isalnum(c)     isalnum((int)(unsigned char)(c))
+#define Isalpha(c)     isalpha((int)(unsigned char)(c))
+#define Isupper(c)     isupper((int)(unsigned char)(c))
+#define Islower(c)     islower((int)(unsigned char)(c))
+#define Toupper(c)     toupper((int)(unsigned char)(c))
+#define Tolower(c)     tolower((int)(unsigned char)(c))
 #include <errno.h>
 #include <fcntl.h>
 #ifndef O_BINARY
jphartmann commented 6 years ago

Excuse me. What is the point? A double cast does not exactly convey credibility.

ghost commented 6 years ago

No i did not miss it, the man page tells that they are useless also instead of clobbering the function names for all the other well behaved systems it would have shown a better judgement to make the defines the other way around

include

ifdef <some NTEBSD flagZ

DEFINE the function names the other way aroun … … …

endif

no reason to clobber 47 sources when one source change would have been enough

On 24 Jan 2018, at 15:36, Rhialto The M. notifications@github.com wrote:

You missed the part where I #define these names:

diff --git a/hstdinc.h b/hstdinc.h index 0fe2556b..3d7298d8 100644 --- a/hstdinc.h +++ b/hstdinc.h @@ -92,6 +92,17 @@

include

include

include

+#define Isspace(c) isspace((int)(unsigned char)(c)) +#define Isprint(c) isprint((int)(unsigned char)(c)) +#define Isdigit(c) isdigit((int)(unsigned char)(c)) +#define Isxdigit(c) isxdigit((int)(unsigned char)(c)) +#define Iscntrl(c) iscntrl((int)(unsigned char)(c)) +#define Isalnum(c) isalnum((int)(unsigned char)(c)) +#define Isalpha(c) isalpha((int)(unsigned char)(c)) +#define Isupper(c) isupper((int)(unsigned char)(c)) +#define Islower(c) islower((int)(unsigned char)(c)) +#define Toupper(c) toupper((int)(unsigned char)(c)) +#define Tolower(c) tolower((int)(unsigned char)(c))

include

include

ifndef O_BINARY

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hercules-390/hyperion/pull/245#issuecomment-360154827, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUXcLX5sWatbgtYsb6VQIoAdqDU6wxvks5tNz_tgaJpZM4RrNVg.

Rhialto commented 6 years ago

@jphartmann it is required by the way the standard defines the ctype macros/functions. See the CAVEAT in the man page that was quoted above, which clearly says why two casts are necessary.

Rhialto commented 6 years ago

@herc4mac The casts need to be included somehow. Not just on NetBSD, but on all systems. It is just that on NetBSD the compiler complains louder, apparently. And unfortunately one cannot #define isspace(c) isspace((int)(unsigned char)(c)) . That would indeed be neater, but alas.

Randomly found webpage discussing the topic:

https://wiki.sei.cmu.edu/confluence/display/c/STR37-C.+Arguments+to+character-handling+functions+must+be+representable+as+an+unsigned+char

jphartmann commented 6 years ago

You are saying, in effect, that NetBSD is not POSIX compliant. I find that hard to believe.

Rhialto commented 6 years ago

@jphartmann no, I am saying the opposite. NetBSD tends to be stricter in checking that the programs one compiles are compliant.

The following code is NOT compliant:

char c = -3; /* valid value for a signed char! */
if (isspace(c)) .... /* invalid value for ctype! *.

because the value of c is not in the range of values that are defined for the ctype functions. This is true also for FreeBSD, but maybe they put the cast in their ctype macros or something; that is conceivable. Or they don't implement the macros using an array. In that case you wouldn't get the warning I saw (char used as array index).

jphartmann commented 6 years ago

I must bow out, then. My only BSD is FreeBSD.

Rhialto commented 6 years ago

In any case, the ctype changes can be left off the pull request; they are not needed to compile on NetBSD.

(But I would recommend including them anyway. I would also recommend getting rid of the many printf format warnings, which are also probably undefined behaviour, and distract from any more interesting warnings the compiler may have to offer.)

jphartmann commented 6 years ago

Some of the printf warnings are errors that the authors cannot be bothered to correct. In some cases issuing the message causes Hercules to crash. Doctor, doctor...

Rhialto commented 6 years ago

Hm, I was going to continue with those warnings and try to fix as many as possible in the pull request, but if the authors are not interested in that, I feel a lot less inclined...

ghost commented 6 years ago

most of the warning are about printing the thread ID and it looks like there is no simple solution the tid IIRC is an opaque structure and even printing properly the structure members might not give any useful information unless You are a wizard in debugging the internals of pthreads :-)

On 24 Jan 2018, at 16:14, Rhialto The M. notifications@github.com wrote:

Hm, I was going to continue with those warnings and try to fix as many as possible in the pull request, but if the authors are not interested in that, I feel a lot less inclined...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hercules-390/hyperion/pull/245#issuecomment-360166512, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUXcBBAI27hv4a1rjgxtCAqnJM3_sQPks5tN0jpgaJpZM4RrNVg.

Rhialto commented 6 years ago

It probably just needs a lot of casts then. to cast thread-ids to unsigned longs (which is how they seem to be printed in most cases). The typical warning I get is

/mnt/vol1/rhialto/cvs/other/hercules/hyperion/hthreads.c:89:9: warning: format '%lx' expects argument of type 'long unsigned int', but argument 7 has type 'TID' [-Wformat=]
         WRMSG( HHC90014, "I", ilk->name, ilk->tid, TRIMLOC( ilk->location ));
         ^

which can probably even be fixed in the WRMSG macro. The ones that are not about WRMSG would need individual fixing.

jphartmann commented 6 years ago

Not the thread ID. DB Trout's mangling of it.

jphartmann commented 6 years ago

The thread ID is a pointer. %p is appropriate.

ivan-w commented 6 years ago

Or is it ? According to linux pthread_self(3)

POSIX.1 allows an implementation wide freedom in choosing the type used to represent a thread ID; for example, representation using either an arithmetic type or a structure is permitted. Therefore, variables of type pthread_t can't portably be compared using the C equality operator (==); use pthread_equal(3) instead.

Also check IEEE Std 1003.1-2008, 2016 Edition (aka SuSv4 ?) for the rationale for pthread_equal.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_equal.html#

Rhialto commented 6 years ago

Looking at hthreads, TID can be different types depending on if posix threads or fish threads are selected. For fish threads, TID = HID = hthread_t = fthread_t = FT_W32_DWORD = unsigned long. For posix threads, TID = HID = hthread_t = pthread_t = (on my system, but that isn't portable) struct __pthread_st *. Yes, that is annoying to print correctly. Maybe cast to intptr_t, then to intmax_t, then print with %j. But that won't work if it is a structure type...

ETA: Stackoverflow has a question on this, and it isn't simple... https://stackoverflow.com/questions/1759794/how-to-print-pthread-t

ghost commented 6 years ago

ant to pile up more stones on the grave of c

both IEEE Std 1003.1, 2004 Edition
IEEE Std 1003.1-2008, 2016 Edition

say ( for almost all the functions in ctype.h )

The c argument is an int, the value of which the application shall ensure is a character representable as an unsigned char or equal to the value of the macro EOF.

what in heaven does the above mean ???

anyway got curious an wrote a q&d test

for all

uint8_t ch int8_t ch char ch unsigned char ch

the call isdigit(ch)

does not give any warning even when using the -Wall flag

the compiler is Apple LLVM version 9.0.0 (clang-900.0.39.2)

Rhialto commented 6 years ago

@herc4mac That means, that he values you can pass to isdigit() (etc), are EOF, 0, 1, 2, 3, .... UCHAR_MAX (255, usually). Unfortunately, a char can have values -128, -127, -126, ..., -1, 0, 1, 2, 3, ... 127. That is if it is signed and 8 bits, which is often the case.

If your string contains non-ASCII characters, such as á or or ¢ encoded in ISO-8859-1 then your char will have a negative value. That's because the code value of such chars is >127 and when stuffed into a signed char, it gets negative.

The reason why this is relevant, is that traditionally / often the ctype macros are implemented similar to

#define isalpha(c)      ((int)((_ctype_tab_ + 1)[(c)] & _CTYPE_A))

which works fine for values EOF (-1 in this implementation), 0, 1, 2, ... 255. Because of such implementations, the POSIX description is the way it is.

But if the compiler sees that c is a char, and char is signed, it warns about that, because c might be negative. And a negative index in an array is usually trouble...

So, the portable solution involves casting c to an unsigned char, as seen in several references now. The part about further casting to int is for "some compilers" which may be picky here.

A reason why your compiler doesn't complain may be that it implements isdigit() in some other way. But a compiler that does not complain is unfortunately not proof of correctness.

srorso commented 6 years ago

It would have been nice if I had read this conversation, say, two weeks ago.

When I changed my e-mail address some months back to leave Yahoo, I messed up github notifications. Which means I only saw issue #243 and its excerpt of the pull request, not the complete pull request.

My mistake, since corrected. And nothing has been committed, so no harm done, except to my ego and my good standing here. Knowing my ego, it will be fine in a few hours.

The -march=native issue seemed like an easy drop-in to the CMake build. To verify, though, I would have to build Hercules on NetBSD and on other systems, and Hercules will not build on NetBSD without much of Pull Request #245. That request being more than just some fixes to configure.ac, addressing the -march=native issue is not a drop-in with respect to construction of the validation environment. And discovering that, at least within the scope of my testing farm, it is unique to gcc 4.8.4 on NetBSD 7.0.1, really lowers the urgency. While I dislike "use a different compiler" as an answer to -march=native, it may be the right answer, or at least a decent short term answer. Ivan's suggested workaround is also good.

I was interested in the configure.ac updates in issue #243 even though I am driving a wooden stake through configure.ac's heart by developing a cross-platform CMake build script. And if those updates were sufficient to enable a Hercules build on NetBSD, I would be good with that. But the configure.ac changes are not sufficient; a larger set of changes is needed. So this is not a drop-in I can include in regression testing configure.ac.

So for the moment I am suspending my efforts on this request in favor of finishing the project I was working on. I have archived my NetBSD virtual machines to offline storage for ready recovery, and I will return to validating the addition of Windows support to CMake, with next steps of a) validating macOS and b) Raspberry PI (using QEMU).

I do not see a consensus about the ctype functions. I am not skilled/experienced enough to make a contribution to the matter. (Yet; see above re ego.)

Looking at the cited stackoverflow page, it appears that the thread id might be a structure, so a set of casts may not reduce the structure to something that can be handled by printf(). The function call suggested could return a string, host-specific and most likely but not necessarily eight or 16 bytes, for inclusion in WRMSG/MSGBUF using %s. Because messages are not on the main emulation path efficiency considerations are not paramount. Getting rid of noise when compiling Hercules has value.

Changing SoftFloat-3a to a shared library makes sense; I have been thinking about this and there have been some off-list discussions about it. Apart from simplifying support in Hercules for NetBSD, it would eliminate some coding specific to Ninja and Softfloat in the coming update to CMake. Enrico's CMake script for SoftFloat-3a simplifies building a shared library. But integrating SoftFloat-3a as a shared library into Hercules would create requirements to change the CMake scripts (reasonable and expected), the GNU Autotools build (Ugh!), and the Windows NMake build (see the contents of subdirectory msvc.makefile.includes). Better to wait a bit.

The discussion, fact-based and respectful, reminds me of some of the discussions I really enjoyed when I was gainfully employed. I learned a lot from those discussions and from this one.

I am grateful for the opportunity to build NetBSD and develop a VirtualBox installation procedure for it (attached, as is the Oracle VirtualBox configuration script).

Again, my mistake, and I apologize to all...especially to Rhialto for creating the impression that this PR could be actioned quickly.

Best Regards, Steve Orso

NetBSD 7.0.1 Installation.pdf NetBSD764.cmd.txt

Rhialto commented 6 years ago

Hi Steve, No problem! I'm happy that it is on your radar. I noticed that the cmake build doesn't produce all those ctype argument warnings. I suppose that is because it has different warning settings. (The actual issue that the warning is about still exists). I'm not married to my particular way of rectifying the ctype arguments; this way is just how I usually did it so far but there may well be others (perhaps inline functions with the same name as the macros; not sure how that relates to relevant standards). And, let me remark here that I do find the ctype warnings as annoying as anyone else. But on the other hand I'm convinced that they are justified and should be acted upon somehow.

srorso commented 6 years ago

Returning to this pull request: There are eight commits. I plan to run a test series with the following five commits that seem to be required for compilation on NetBSD 7.0.1, supplemented by any CMake build changes needed to deal with the oddities of gcc 4.8.4 on NetBSD 7.0.1.

9efbcb8 This looks like a typo (variable does not match header file name)
e8c20aa More careful use of includes when testing for further features.
661d301 Treat NetBSD same as FreeBSD.
3074d93 Add "Hard-coded NetBSD-specific features and options..."
3886f5b Force libtool to accept .a files to link into shared libs

Testing will include BSD, Linux, macOS, Solaris and Windows systems.

Once tested, I will use git cherry-pick to include these commits into the Hercules-390 hyperion repository. Use of cherry-pick will preserve original authorship and give credit to Rhialto for his efforts.

I will defer action on the following commits:

73498d3 Add configure check for pthread_rwlockattr_setpshared().

I do not understand the need for this because pthread_rwlockattr_setpshared does not seem to be used in Hyperion. If this is not correct, it will become apparent during testing, and I will include it.

7b3cca1 Add configure check for pthread_getcpuclockid().

Peter Jensen has committed corrections for this issue. Excluding this commit will avoid the issues presented by the conflict in config.c noted above.

19b74bc Get rid of one class of the many many many many warnings from gcc 4.8.4:

This is a good thing to be working on, but it is separate from the task of enabling a build on NetBSD.

srorso commented 6 years ago

Hi Rhialto,

Regarding:

I will defer action on the following commits:

73498d3 Add configure check for pthread_rwlockattr_setpshared().

I do not understand the need for this because pthread_rwlockattr_setpshared does not seem to be used in Hyperion. If this is not correct, it will become apparent during testing, and I will include it.

Building without this commit led to a bunch of research, the result of which is that for the moment, at least, I will make the setpshared() call conditional on the existence of the associated POSIX macro. This is consistent with what Peter has done with the thread clock issue.

Best Regards, Steve Orso