icecc / icecream

Distributed compiler with a central scheduler to share build load
GNU General Public License v2.0
1.6k stars 252 forks source link

What would it take to support additional languages? #470

Closed zbeekman closed 5 years ago

zbeekman commented 5 years ago

In particular, I'd really like to see/contribute GFortran support, but I'm not sure where to start. Could someone please point me in the right direction?

llunak commented 5 years ago

It works like this: Only source -> objectfile compiles are distributed, so only commands in the form of 'gcc -c a.c -o a.o'. This is internally achieved by only preparing the sources locally with something like 'gcc -E a.c -o a.i', then the one resulting file is shipped to the remote node, where something like 'gcc -c a.i -o a.o' is done and the resulting file it sent back. In practice this is more complex, e.g. the remote compiler is packaged by the icecc-create-env script, but this is the basic idea and you can even try manually these commands to see how it works.

So Fortran support should be the simpler to add the more it acts like this. If there's a way to convert the compiler input to just one file containing everything, then this can be started by extending CompileJob::Language in services/jobs.h and then based on this special-casing everything that needs special handling.

zbeekman commented 5 years ago

This may end up being considerably more complicated than I originally anticipated.

The chief problems are:

  1. There is no standardized preprocessing and yet there is a language statement include "filename.ext". In practice, if any pre-processing is done people usually assume C preprocessing and gfortran -E works more or less as expected, however I do not believe that it will embed Fortran source code imported through intrinsic include statements.
  2. There may be one or more additional outputs from the compilation: the .mod (and now, as of Fortran 2008 .smod) module files. These are analogous to pre-compiled headers, kinda-sorta.
  3. Compilation order matters, and for the very strong syntax and type checking dictated by the language standard, the .mod files produced when processing modules must be present and accessible to the compiler at compile time for sources that use any of the modules.

As a work around for 1. it would be relatively straightforward to write a script to convert intrinsic include "foo.i90" statements to #include "foo.i90".

Points 2. and 3. would be significantly harder to address. I can see to potential ways to approach this:

  1. Most compilers have a 'syntax only' flag that will check your syntax for use with linters etc. Fortunately they produce .mod files as a by-product of parsing and checking the source. (And unfortunately, they require any .mod files associated with modules used by the source to be present already.) So, if running with, e.g., -fsyntax-only for gfortran is significantly cheaper than compilation, the .mod files could be generated in a first pass, before sending them, along with the sources to compilation nodes/workers. Not great as far as solutions go.
  2. Compute the compilation dependency graph (DAG) to split the compilation between workers. Each worker would get all .mod files produced so far sent to them, and then return the compiled object file (or files if processing a given branch of the DAG) along with any newly produced .mod files.

The first work around may yield poor speedups (or slow downs) and the second one would be the correct way to do this, but that will be a non-trivial task to accomplish.

ossilator commented 5 years ago

i think you'll find considerable overlap with the discussion in my "favorite" issue #138.

zbeekman commented 5 years ago

I suspect gfortran .mod files may be (relatively) smaller than PCHs but it likely depends on the context. Under the hood they are just gzipped text files. (You can inspect them if you force gzip to process them.) But the bigger issue may be that the have to be shipped there for compilation, and new ones shipped back. At any rate, thanks for linking that other issue.

llunak commented 5 years ago

While there is support for sending extra files to the remote node, it is currently done by rebuilding the compiler tarball, so if these changes happen often, the overhead may be too big.

Regardless, unless you have more questions, I'm afraid there's not much more we can do for you at this time. If you have more questions, please reopen and ask.