mirage / ocaml-solo5

Freestanding OCaml runtime
Other
100 stars 30 forks source link

Provide byte-swapping functions in endian.h, pass -D__ocaml_freestanding__ via CFLAGS #73

Closed hannesm closed 4 years ago

hannesm commented 4 years ago

the code of endian.h is taken from musl, and adapted to make clang's warnings go away.

in addition, -D__freestanding__ is passed via CFLAGS, which allows other C code to use #if defined(__freestanding__) and conditionally do something.

An example use of both endian.h and the freestanding is https://github.com/hannesm/mirage-crypto/commit/a66190dccf9c596c24dfc4a45a911ef9effafab7 (which compiles fine with this change and the corresponding solo5 PR).

I'm interested in your opinion, and am open for suggestions: given that BYTE_ORDER will always be LITTLE_ENDIAN, is it worth to keep the 12 lines of defines? should -D __freestanding__ be renamed to something else?

This PR allows us to unify the copied headers of solo5 a bit more (see https://github.com/Solo5/solo5/pull/453), and has been requested in #71. It will as well help us to eventually resolve https://github.com/mirage/hacl/issues/30.

mato commented 4 years ago

I'm interested in your opinion, and am open for suggestions: given that BYTE_ORDER will always be LITTLE_ENDIAN, is it worth to keep the 12 lines of defines?

I'd keep them. In fact, the compiler predefines a __BYTE_ORDER__ (see https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html, also present on clang), so the arch-dependent block of #ifs can be modified suitably to depend only on the compiler's idea of endianness, which makes endian.h entirely generic and future-proof, which is good.

should -D freestanding be renamed to something else?

Dunno. -D__freestanding__ seems fine to me.

hannesm commented 4 years ago

should -D__freestanding__ be renamed to something else?

Dunno. -D__freestanding__ seems fine to me.

After thinking about it more, I renamed it to -D__mirage_freestanding__. For people from the outside (who are not familiar with MirageOS internals), it is easier to relate to something where mirage is present in the name.

mato commented 4 years ago

After thinking about it more, I renamed it to -D__mirage_freestanding__. For people from the outside (who are not familiar with MirageOS internals), it is easier to relate to something where mirage is present in the name.

If it's not just __freestanding__ then if you don't mind I'd prefer __ocaml_freestanding__ since that matches the name of the package that introduces the flag. Googling for "ocaml freestanding" leads here, and by extension to Mirage, so people will make the association anyway.

hannesm commented 4 years ago

@mato sure, done

mato commented 4 years ago

@hannesm

I wonder if there's a way to make this work both in the presence and absence of Solo5/solo5#453 -- the problem I see is that it's technically an API-breaking change between layers, but I'd like to avoid going through the whole opam upper/lower bounds dance for it.

I started writing up notes on the various possible failure cases, but then got distracted by a video call and have lost most of the context now. One case I can remember is:

If there is some downstream consumer that, on a FreeBSD build host with a released version of Solo5 and ocaml-freestanding with this change, includes (transitively or otherwise) both sys/endian.h (getting the system one mistakenly provided by Solo5) and endian.h (getting this new one), then the compilation will fail due to various duplicate definitions.

What would help a lot is if you could try and track down all the currently released consumers of a endian.h or sys/endian.h downstream of ocaml-freestanding. i.e. is it potentially used by existing releases of gmp-freestanding? mirage-cypto? ocaml-nocrypto?

That would at least help identify if we need to do something about the failure cases/API break or if we can mostly ignore it.

hannesm commented 4 years ago

I could find the following users of endian.h - nocrypto, mirage-crypto, digestif. They are all using the same copied-around bitfn.h which has an #ifdef chain based on __linux__ / __FreeBSD__ -- without a case for __ocaml_freestanding__ --> the current released packages will continue with a release of ocaml-freestanding with this PR merged.

They will all fail with an updated solo5 (on BSD systems, due to lack of sys/endian.h). The only way to mitigate this as far as I can tell is to add conflict statements to the existing packages for "solo5-{kernel,bindings}-{hvt,virtio}" {> "0.6.4"} (os = "freebsd" | os = "openbsd").

Safety of this PR: it may lead to multiple defined symbols if and only if something #include <sys/endian.h> and #include <endian.h>. I have not observed any library doing this in the wild. To mitigate such a situation, I would be fine to include (EDIT: this only helps if our <endian.h> is included second (i.e. #include <sys/endian.h>; #include <endian.h>)) EDIT: just define both, as done in the last commit.

#ifndef _SYS_ENDIAN_H //both FreeBSD and OpenBSD sys/endian.h define this.
#ifndef _ENDIAN_H
#define _SYS_ENDIAN_H
#define _ENDIAN_H
... our endian.h ...

Once this PR is merged and released, there need to be update to digestif and mirage-crypto (to continue to work on BSD systems - which will then depend on ocam-freestanding > 0.5.0), the smallest possible patch is in bitfn to start with:

#ifdef __ocaml_freestanding__
#include <endian.h>
... reset of elif chain ...
hannesm commented 4 years ago

8a48770b75f09952acfdb46d1ad705225039351c defines both _SYS_ENDIAN_H and _ENDIAN_H (I'm not sure whether this is sensible tbh, as mentioned nobody should include both <sys/endian.h> and <endian.h>). I'd be fine to drop that commit and consider it not ocaml-freestanding's problem.

The commit 5dfe5294e4d2d6dc7a8421bdbac7989af12e1673 is slightly more interesting:

(1) is important imho, even for the above linked bitfn.h, which checks #if LITTLE_ENDIAN == BYTE_ORDER ... #eilf BIG_ENDIAN == BYTE_ORDER. Previously, on a Linux system with solo5 the first if test evaluated to true because neither LITTLE_ENDIAN nor BYTE_ORDER were defined. IMHO a bit dangerous.

In respect to undefining preprocessor defines, mini-os pkg-config file does: -U__linux__ -U__FreeBSD__ -U__sun__ -U__linux -D__MINIOS__ (which is partial according to a remote debian host, where cc defines as well __gnu_linux__ and linux). I can see value in undefining these preprocessor-defines that are used for conditional compilation, but I as well understand to avoid maintaining such a list (that's why there are no -U in this PR). If we want list to undefine, I'd propose: -U__linux__ -U__linux -Ulinux -U__gnu_linux__ -U__FreeBSD__ -U__OpenBSD__ (I'm still on the fence with this, and suspect it may be slightly more user-friendly / error-prone). I added a commit to undefine these, as outlined above. Makes the behaviour more deterministic across platforms. The list only needs to be updated when we support new platforms (macos, win32, NetBSD, ..), which does not happen too often.

hannesm commented 4 years ago

I added a commit which undefines the definitions mentioned above. The advantage of doing so is many-fold: the compilation of C sources with ocaml-freestanding is more uniform across platforms; and the breakage (in respect to nocrypto/mirage-crypto/digestif -- bitfn.h) is non-existing. In addition, the hacl package (which was the motivation to look into this issue for me) will just work as is (since their #ifdef chain respects the C compiler __BYTE_ORDER__ as a last resort (which is now hit with ocaml-freestanding on any platform)). In mirage-crypto I removed all the #ifdef in https://github.com/mirage/mirage-crypto/pull/56 and now depend on __BYTE_ORDER__ defined by the C compiler.

It is also more the truth of ocaml-freestanding -- this is neither linux nor FreeBSD nor anything else (and C code that uses #ifdef chains to select the platform should better fail hard (at compilation) when freestanding is encountered than soft (at link time).

Given the current mess of POSIX and endian.h (where to put it, which symbols to export), maybe we should just not provide it in freestanding at all?

mato commented 4 years ago

@hannesm Thank you very much for looking into this, I appreciate it. I've done a V2 inspired by your changes, but with a few other improvements in #74. Continuing the discussion there.

hannesm commented 4 years ago

superseeded by #74