justinethier / cyclone

:cyclone: A brand-new compiler that allows practical application development using R7RS Scheme. We provide modern features and a stable system capable of generating fast native binaries.
http://justinethier.github.io/cyclone/
MIT License
823 stars 42 forks source link

Idea: Only bootstrap cyclone compiler with pre-existing compiler #476

Closed nmeum closed 2 years ago

nmeum commented 3 years ago

The code in this repository requires a cyclone compiler to be installed to compile (e.g. as obtained via cyclone-bootstrap). My understanding is that the Makefile currently compiles everything (the libraries, the cyclone binary, and the icyc binary) using this per-existing compiler. Furthermore, the Makefile also uses libraries etc. provided by the per-existing cyclone installation. This is probably also why the Development Guide recommends installing a modified cyclone first and then performing a clean build and run all tests. I personally noticed this while working on #473, since the icyc prompt was not flushed (after make clean && make) until I installed my modified cyclone version and then re-compiled icyc.

I am wondering if this is strictly necessary, shouldn't it be possible to only bootstrap the cyclone compiler (i.e. cyclone.scm) using the pre-existing installation and afterwards compile everything else (e.g. libs and icyc) using this bootstraped compiler (maybe even re-compiling the compiler itself with the newly compiled libs as a last step)? This would allow changes to library code etc. to become effective "immediately" without requiring the "make install and rebuild"-step.

For the cyclone Alpine Linux package I am currently working on, I encountered this problem too since I wanted to backport #473 as part of this package. icyc is barely usable on Alpine without it and installing and then re-compiling isn't easily possible as part of the package recipe. For this reason I currently build cyclone as follows:

# Bootstrap the cyclone compiler using the compiler provided by cyclone-bootstrap.
make -j1 cyclone

# With the bootstraped compiler, compile the libraries.
make -j1 CYCLONE="./cyclone -I . -COPT '-Iinclude' -CLNK '-L.'" libs

# Using the compiled libraries and the bootstrapped compiler, compile icyc etc.
make -j1 CYCLONE="./cyclone -I . -COPT '-Iinclude' -CLNK '-L.'" all

The problem with this primitive approach is probably that the build system does currently not seem to track prerequisites correctly everywhere (hence also the -j1). So if Scheme library A depends on library B, B might not be build before A and in that case it probably uses the library version B from cyclone-bootstrap but you get the idea. If my understanding of the build process is correct and my proposal is feasible, it would be nice to have it integrated into the buildsystem as it makes backporting patches in a package distribution context easier and it should also ease development. Also note that this is conceptually similar to https://github.com/justinethier/cyclone-bootstrap/pull/16.

Let me know if I misunderstood the bootstrapping process and if not if you are generally interested in this.

justinethier commented 3 years ago

The problem is that the compiler depends heavily on code in the libraries, as well as cyclone.scm. So we have to bootstrap just about the whole system regardless. Also, depending on what is changed, it may be trivial to bootstrap a fix (typically the case) or it may be more difficult (EG: if a key data type or call signature was modified).

Anyway, that is why any changes made to the main repo are synced and committed to the bootstrap repo as part of our dev cycle. That way you can pull and build the latest from bootstrap to get your changes, and we guarantee everything will work as expected. The goal is to make it as easy as possible to pick up fixes.

So for example if you take the latest from bootstrap you will pick up the fix for #473.

It would probably be best to create packages using cyclone-bootstrap since everything can be built using existing tooling (gcc, make, etc). And only the single source repository is needed.

Does that help?

nmeum commented 3 years ago

The problem is that the compiler depends heavily on code in the libraries, as well as cyclone.scm. So we have to bootstrap just about the whole system regardless.

I am aware of that, I don't want to abolish cyclone-bootstrap. I just think after cyclone.scm is bootstrapped (e.g. using cyclone-bootstrap) it should be possible to use it to compile the rest of the codebase (libraries, icyc, …) e.g.:

diff --git a/Makefile b/Makefile
index 8363acf6..0b128012 100644
--- a/Makefile
+++ b/Makefile
@@ -4,8 +4,10 @@

 include Makefile.config

+CYCLONE_SYSTEM = cyclone -A .
+CYCLONE_LOCAL = ./cyclone -A . -COPT '-Iinclude' -CLNK '-L.'
+
 # Commands
-CYCLONE = cyclone -A .
 CCOMP = $(CC) $(CFLAGS)
 INDENT_CMD = indent -linux -l80 -i2 -nut

@@ -146,12 +148,12 @@ doc :
 .PHONY: clean full bench bootstrap tags indent debug test doc

 game-of-life :
    cd $(EXAMPLE_DIR)/game-of-life ; $(MAKE)
@@ -162,13 +164,13 @@ hello-library/hello :
 libs : $(COBJECTS)

 $(COBJECTS) : %.o: %.sld
-   $(CYCLONE) $<
+   $(CYCLONE_LOCAL) $<

 cyclone : cyclone.scm $(CYC_RT_LIB) $(CYC_BN_LIB)
-   $(CYCLONE) cyclone.scm
+   $(CYCLONE_SYSTEM) cyclone.scm

 icyc : icyc.scm $(CYC_RT_LIB) $(CYC_BN_LIB)
-   $(CYCLONE) $<
+   $(CYCLONE_LOCAL) $<

 $(CYC_RT_LIB) : $(CFILES) $(HEADERS) $(CYC_BN_LIB)

So for example if you take the latest from bootstrap you will pick up the fix for #473.

It would probably be best to create packages using cyclone-bootstrap since everything can be built using existing tooling (gcc, make, etc). And only the single source repository is needed.

Sure, but cherry-picking/backporting changes from the cyclone-bootstrap repository would boil down to including diffs for autogenerated C code as part of the Alpine package recipe which is a bit iffy. As such, I actually think it's preferable to have a cyclone-bootstrap package and the use that to bootstrap the cyclone code in this repository and ship that in a separate package.

justinethier commented 3 years ago

Regarding the makefile, I see what you are saying. Let me see about setting something up.

Sure, but cherry-picking/backporting changes from the cyclone-bootstrap repository would boil down to including diffs for autogenerated C code as part of the Alpine package recipe which is a bit iffy. As such, I actually think it's preferable to have a cyclone-bootstrap package and the use that to bootstrap the cyclone code in this repository and ship that in a separate package.

Apologies for not being more familiar with your distro. How bleeding edge is Alpine? Wouldn't it make more sense to create packages based on official releases of Cyclone rather than diffs? For example, I am going to release 0.32.0 soon which will include the fix for #473. At that point is it still necessary for you to include diffs? Or are there other things that you need to customize for Alpine?

Which brings up another off-topic point - how are you handling stack size? Because Alpine's default 128 KB thread stack size is much lower than the 500 KB that Cyclone uses out of the box. Maybe this hasn't been a problem for single-threaded Cyclone programs since the main process has 8 MB of stack space.

justinethier commented 3 years ago

Regarding stack size, I am thinking about setting up a makefile-based build option to control using pthread_attr_setstacksize to set a sufficiently-large stack size for Cyclone threads. Do you have any concerns/comments on that solution?

nmeum commented 3 years ago

Regarding the makefile, I see what you are saying. Let me see about setting something up.

Keep in mind though that this may require some tinkering with Makefile prerequisite in order to work properly, see my original post.

How bleeding edge is Alpine?

We have a rolling release variant called Alpine Linux Edge which is used for development and everything 6 months we make a new stable release of Alpine.

Wouldn't it make more sense to create packages based on official releases of Cyclone rather than diffs?

We are creating packages based on offical releases of upstream software. However, occasionally we need to backport patches from upstream to an existing release. So for example, if you fix a critical security or functionality issue in Cyclone Git HEAD and don't immediately make a new release we backport the patch for this issue to the Cyclone release we are packaging. This is what I did for #473 (i.e. backport my fix to the 0.31.0 Cyclone release). This is especially important for Alpine stable releases where we aim for offering 6 months of support for the packaged software releases.

We try to avoid shipping releases with patches but for being able to backport security/functionally fixes we do need the ability to backport patches and backporting fixes by diffing autogenerated C code is not really an option, thus the need to package both cyclone-bootstrap and cyclone.

For example, I am going to release 0.32.0 soon which will include the fix for #473. At that point is it still necessary for you to include diffs? Or are there other things that you need to customize for Alpine?

With 0.32.0 Cyclone should work fine on Alpine without any additional patches. At least for now :)

Which brings up another off-topic point - how are you handling stack size?

I have only used Cyclone with single-threaded programs on Alpine so far, thus I probably haven't run into this issue yet, but Alpine/musl does indeed use a smaller pthread stack size by default. Changing this default via a Makefile configuration option and pthread_attr_setstacksize sounds good to me.

nmeum commented 3 years ago

Regarding commit 15a8f2cfe51bffa2844159cd267bd896a1c7ed46 I noticed the following while testing #477: The -COPT/-CLNK option causes compiler/linker options to be appended instead of prepended. For example:

$ cyclone -d -A . -COPT '-Iinclude' -CLNK '-L.' test.scm
gcc test.c  -O2 -Wall -I/usr/include -c -o test.o -Iinclude
gcc test.o  /usr/lib/cyclone/scheme/cyclone/common.o  /usr/lib/cyclone/scheme/base.o  /usr/lib/cyclone/scheme/write.o  -pthread -lcyclone -lck -lm -lcyclonebn -ldl  -O2 -fPIC -Wall -Wno-shift-negative-value -Wno-unused-command-line-argument -I/usr/include -L/usr/lib -Wl,--export-dynamic -o test -L.

However, this causes a problem with $(CYCLONE_LOCAL) as $(CYCLONE_LOCAL) adds -L. to use the the newly compiled libcyclone.a over the system-wide installed libcyclone.a, however, if libcyclone.a is present in /usr/lib (which will be searched first it) the $(CYCLONE_LOCAL) invocation will not pick up the new libcyclone.a version. If -L. would be added to the linker command-line before -L/usr/lib the local libcyclone.a should take precedence. Same with the -Iinclude from -COPT.

Not sure if it is preferable to always prepend -COPT/-CLNK as this may prevent overwriting other options such as optimization level where the last option takes precedence, if I recall correctly.