stan-dev / stanc3

The Stan transpiler (from Stan to C++ and beyond).
BSD 3-Clause "New" or "Revised" License
140 stars 44 forks source link

Add c bytecode target for stanc #1311

Closed andrjohns closed 1 year ago

andrjohns commented 1 year ago

This draft PR is for testing and getting feedback on an approach to bundling the C bytecode for stanc, and the minimum necessary sources from ocaml, so that users can compile stanc locally (no more precompiled binaries!).

From my (limited) local testing, this is working across Linux, Mac, and Windows

andrjohns commented 1 year ago

@serban-nicusor-toptal it looks like the docker image is missing the file utility:

./configure: line 6771: /usr/bin/file: No such file or directory

Would you mind updating the image to install it when you get the time? Thanks!

andrjohns commented 1 year ago

Alright, this is ready for review.

I've added a new makefile Makefile.package-stanc which contains the recipes for obtaining the stanc C bytecode and extracting the ocaml sources and headers that are needed to compile it.

These are placed in a folder (stanc) which has the Makefile and main.c needed for compiling a standalone binary. Currently this process requires the Ocaml configure script to generate machine-specific headers (caml/m.h and caml/s.h) but this is probably something that could be streamlined/simplified.

I've included a github actions which shows the bytecode and C files being generated under ubuntu, and then being built under ubuntu, mac, and windows and running the cmdstan tests

andrjohns commented 1 year ago

Thanks! It's been one of those things that's been stuck in my "should be possible" list for a while!

Absolutely agree that it doesn't have to (or should) live in stanc3. I could setup another repo (stanc3-bytecode?) with CI to regularly pull in stanc3 and update the bytecode & sources (and test). Then if nothing breaks after a release cycle or two of normal development, we could open a discussion about including that in cmdstan as a submodule to replace the stanc3 binary?

WardBrian commented 1 year ago

Do you have sufficient permissions that you can make a new repo under the stan-dev organization's name?

we could open a discussion about including that in cmdstan as a submodule to replace the stanc3 binary?

I'd be curious if there's a noticeable performance difference between the bytecode executables this produces and the native binaries we currently ship.

I'd probably argue in favor of including the linux-stanc, mac-stanc, and windows-stanc binaries in releases, and probably keep our CI downloading nightly releases rather than building the bytecode version from source, but it definitely seems like a reasonable option for platforms other than the "big three" or if there are users who don't want to use something they didn't compile (not that the bytecode file is particularly readable, but that's another story).

Am I correct in guessing you'll want to move RStan to use this rather than stanc3js?

andrjohns commented 1 year ago

Do you have sufficient permissions that you can make a new repo under the stan-dev organization's name?

Good shout, no I don't. But thinking it through more, it might be too early to have the repo under stan-dev since that might provide some kind of implicit endorsement by Stan - while this is probably still a bit too experimental. I'll open the repo under my account instead and we can discuss moving it under the stan-dev umbrella later if all appears stable.

I'd be curious if there's a noticeable performance difference between the bytecode executables this produces and the native binaries we currently ship.

Me too, I could add some timing tests/comparisons as a future goal for the separate repo CI

I'd probably argue in favor of including the linux-stanc, mac-stanc, and windows-stanc binaries in releases

Interesting, I was thinking it would be easier to have this as the main format for stanc - with the native binaries for those that have some need for speed (e.g., a workflow involving parsing many many models). Since this would allow for us to distribute a single cmdstan release source which could be used for every platform. But I don't have any hard plans or strong opinions on that.

One of the other places where I see this being useful is that it makes it much easier to package stanc for package distribution platforms (homebrew, apt, etc.) as a standalone program. But I'm not sure how much of a usecase there is for that just yet.

Am I correct in guessing you'll want to move RStan to use this rather than stanc3js?

Still a bit early to say. CRAN has strict requirements on the C/C++ in packages (has to compile without warnings under -Wpedantic) which can be a bit of pain if you're working with external code (currently dealing with this for replacement V8 engine). I haven't tried building under -Wpedantic yet to see how much work would be needed

WardBrian commented 1 year ago

Sounds good. I’m very interested in how it all shakes out!

The wpedantic restriction would probably be an issue is my guess. Shame, because using an executable of stanc that could access the file system would fix a lot of issues with things like how RStan handles #include directives

andrjohns commented 1 year ago

I've changed this PR to just be adding the (byte c) target to the stanc dune file. The (best exe) is the current default behaviour, so there's no change to the current executables

WardBrian commented 1 year ago

One thing to flag @andrjohns is that the OCaml source code and runtime are distributed under the LGPL. This would mean that the bytecode source (which must actually distribute the source, rather than just link to a library) would fall under the LGPL as well.

I don't believe this is necessarily a huge issue, since none of the code which would be affected is incorporated into the resulting Stan programs, so they would be unaffected (like how GCC is GPL, but the produced executables are not), but wanted to point it out

andrjohns commented 1 year ago

Ahh good point. I'll read up a bit on the details so I don't run into any surprises