Closed hannesm closed 4 years ago
I'm interested in your opinion, and am open for suggestions: given that BYTE_ORDER will always be LITTLE_ENDIAN, is it worth to keep the 12 lines of defines?
I'd keep them. In fact, the compiler predefines a __BYTE_ORDER__
(see https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html, also present on clang), so the arch-dependent block of #ifs can be modified suitably to depend only on the compiler's idea of endianness, which makes endian.h entirely generic and future-proof, which is good.
should -D freestanding be renamed to something else?
Dunno. -D__freestanding__
seems fine to me.
should
-D__freestanding__
be renamed to something else?Dunno.
-D__freestanding__
seems fine to me.
After thinking about it more, I renamed it to -D__mirage_freestanding__
. For people from the outside (who are not familiar with MirageOS internals), it is easier to relate to something where mirage
is present in the name.
After thinking about it more, I renamed it to -D__mirage_freestanding__. For people from the outside (who are not familiar with MirageOS internals), it is easier to relate to something where mirage is present in the name.
If it's not just __freestanding__
then if you don't mind I'd prefer __ocaml_freestanding__
since that matches the name of the package that introduces the flag. Googling for "ocaml freestanding" leads here, and by extension to Mirage, so people will make the association anyway.
@mato sure, done
@hannesm
I wonder if there's a way to make this work both in the presence and absence of Solo5/solo5#453 -- the problem I see is that it's technically an API-breaking change between layers, but I'd like to avoid going through the whole opam upper/lower bounds dance for it.
I started writing up notes on the various possible failure cases, but then got distracted by a video call and have lost most of the context now. One case I can remember is:
If there is some downstream consumer that, on a FreeBSD build host with a released version of Solo5 and ocaml-freestanding with this change, includes (transitively or otherwise) both sys/endian.h
(getting the system one mistakenly provided by Solo5) and endian.h
(getting this new one), then the compilation will fail due to various duplicate definitions.
What would help a lot is if you could try and track down all the currently released consumers of a endian.h
or sys/endian.h
downstream of ocaml-freestanding. i.e. is it potentially used by existing releases of gmp-freestanding
? mirage-cypto
? ocaml-nocrypto
?
That would at least help identify if we need to do something about the failure cases/API break or if we can mostly ignore it.
I could find the following users of endian.h - nocrypto, mirage-crypto, digestif. They are all using the same copied-around bitfn.h
which has an #ifdef
chain based on __linux__
/ __FreeBSD__
-- without a case for __ocaml_freestanding__
--> the current released packages will continue with a release of ocaml-freestanding with this PR merged.
They will all fail with an updated solo5 (on BSD systems, due to lack of sys/endian.h
). The only way to mitigate this as far as I can tell is to add conflict
statements to the existing packages for "solo5-{kernel,bindings}-{hvt,virtio}" {> "0.6.4"} (os = "freebsd" | os = "openbsd")
.
Safety of this PR: it may lead to multiple defined symbols if and only if something #include <sys/endian.h>
and #include <endian.h>
. I have not observed any library doing this in the wild. To mitigate such a situation, I would be fine to include (EDIT: this only helps if our EDIT: just define both, as done in the last commit.<endian.h>
is included second (i.e. #include <sys/endian.h>; #include <endian.h>
))
#ifndef _SYS_ENDIAN_H //both FreeBSD and OpenBSD sys/endian.h define this.
#ifndef _ENDIAN_H
#define _SYS_ENDIAN_H
#define _ENDIAN_H
... our endian.h ...
Once this PR is merged and released, there need to be update to digestif and mirage-crypto (to continue to work on BSD systems - which will then depend on ocam-freestanding > 0.5.0), the smallest possible patch is in bitfn to start with:
#ifdef __ocaml_freestanding__
#include <endian.h>
... reset of elif chain ...
8a48770b75f09952acfdb46d1ad705225039351c defines both _SYS_ENDIAN_H and _ENDIAN_H (I'm not sure whether this is sensible tbh, as mentioned nobody should include both <sys/endian.h>
and <endian.h>
). I'd be fine to drop that commit and consider it not ocaml-freestanding's problem.
The commit 5dfe5294e4d2d6dc7a8421bdbac7989af12e1673 is slightly more interesting:
BYTE_ORDER
, LITTLE_ENDIAN
and BIG_ENDIAN
(instead of __LITTLE_ENDIAN
, __BIG_ENDIAN
, and __BYTE_ORDER
). To me it looks that preprocessor macros with __
prefix are only used by C compiler builtins (i.e. cc -dM -E - < /dev/null
), and conventional endian.h
files use the definitions without underscores (1)__builtin_swapYY
functions instead of handcrafting custom ones. all C compilers I could get my hands on provide these builtins (the previous functions were taken from musl, but modified to compile without warnings with clang).(1) is important imho, even for the above linked bitfn.h
, which checks #if LITTLE_ENDIAN == BYTE_ORDER
... #eilf BIG_ENDIAN == BYTE_ORDER
. Previously, on a Linux system with solo5 the first if
test evaluated to true because neither LITTLE_ENDIAN
nor BYTE_ORDER
were defined. IMHO a bit dangerous.
In respect to undefining preprocessor defines, mini-os pkg-config file does: -U__linux__ -U__FreeBSD__ -U__sun__ -U__linux -D__MINIOS__
(which is partial according to a remote debian host, where cc defines as well __gnu_linux__
and linux
). I can see value in undefining these preprocessor-defines that are used for conditional compilation, but I as well understand to avoid maintaining such a list (that's why there are no -U
in this PR). If we want list to undefine, I'd propose: I added a commit to undefine these, as outlined above. Makes the behaviour more deterministic across platforms. The list only needs to be updated when we support new platforms (macos, win32, NetBSD, ..), which does not happen too often.-U__linux__ -U__linux -Ulinux -U__gnu_linux__ -U__FreeBSD__ -U__OpenBSD__
(I'm still on the fence with this, and suspect it may be slightly more user-friendly / error-prone).
I added a commit which undefines the definitions mentioned above. The advantage of doing so is many-fold: the compilation of C sources with ocaml-freestanding is more uniform across platforms; and the breakage (in respect to nocrypto/mirage-crypto/digestif -- bitfn.h) is non-existing. In addition, the hacl package (which was the motivation to look into this issue for me) will just work as is (since their #ifdef
chain respects the C compiler __BYTE_ORDER__
as a last resort (which is now hit with ocaml-freestanding on any platform)). In mirage-crypto I removed all the #ifdef
in https://github.com/mirage/mirage-crypto/pull/56 and now depend on __BYTE_ORDER__
defined by the C compiler.
It is also more the truth of ocaml-freestanding -- this is neither linux nor FreeBSD nor anything else (and C code that uses #ifdef
chains to select the platform should better fail hard (at compilation) when freestanding is encountered than soft (at link time).
Given the current mess of POSIX and endian.h (where to put it, which symbols to export), maybe we should just not provide it in freestanding at all?
@hannesm Thank you very much for looking into this, I appreciate it. I've done a V2 inspired by your changes, but with a few other improvements in #74. Continuing the discussion there.
superseeded by #74
the code of endian.h is taken from musl, and adapted to make clang's warnings go away.
in addition,
-D__freestanding__
is passed via CFLAGS, which allows other C code to use#if defined(__freestanding__)
and conditionally do something.An example use of both endian.h and the freestanding is https://github.com/hannesm/mirage-crypto/commit/a66190dccf9c596c24dfc4a45a911ef9effafab7 (which compiles fine with this change and the corresponding solo5 PR).
I'm interested in your opinion, and am open for suggestions: given that BYTE_ORDER will always be LITTLE_ENDIAN, is it worth to keep the 12 lines of defines? should
-D __freestanding__
be renamed to something else?This PR allows us to unify the copied headers of solo5 a bit more (see https://github.com/Solo5/solo5/pull/453), and has been requested in #71. It will as well help us to eventually resolve https://github.com/mirage/hacl/issues/30.