Consider shared SCM compiler cache

part-cw / lambdanative

LambdaNative is a cross-platform development environment written in Scheme, supporting Android, iOS, BlackBerry 10, OS X, Linux, Windows, OpenBSD, NetBSD, FreeBSD and OpenWrt.

http://www.lambdanative.org

Other

1.4k stars 86 forks source link

Consider shared SCM compiler cache #327

Open mgorges opened 4 years ago

mgorges commented 4 years ago

Gambit itself is actually target agnostic. (Which made me - originally - question why the .scm source is compiled anew for each target. According to gambit the result is supposed to be identical.)

Originally posted by @0-8-15 in https://github.com/part-cw/lambdanative/pull/321#issuecomment-699136367

mgorges commented 4 years ago

I assume we could explore caching the build/*.c files globally, but I am afraid I get yelled at again if I brake things doing so, so not a priority for the time being. - So, yes, it is possible in principle but the gains are marginal?

If I look at how much time one might save, for macOS compiling DemoRedSquare from scratch is about SCM-piece: 11sec and C-piece: 15sec; adding cross-compiling for win32 the numbers are about SCM-piece: 0sec as its already done and 708sec. If I clear again and make android (api21) the differences are SCM-piece: 13sec and C-piece: 85sec for arm, and SCM-piece: 0sec and C-piece: 25sec for arm64, and finally SCM-piece: 0sec and C-piece: 15sec for x86_64. --> So in the big picture the impact is somewhere around 10-50 seconds saved?

mgorges commented 4 years ago

This is the patch I used to benchmark it https://gist.github.com/mgorges/35be89a793963e018601f055b29bb562; as I wrote, not willing to merge this right now but something to think about for the future.

0-8-15 commented 4 years ago

Am Sat, 26 Sep 2020 02:42:32 -0700 schrieb Matthias Görges notifications@github.com:

I assume we could explore caching the build/*.c files globally, but I am afraid I get yelled at again if I brake things doing so, so not a priority for the time being. - It is possible in principle and the gains are marginal?

Those gains are actually marginal. At least wrt. compile time. And disk space savings are IMHO neither worth the development time nor the added source complexity.

Furthermore - as I mentioned - since I've seen those C files cached I started to rely on the fact:

I know there are not many people in your team at this time, which use cond-expand or other tricks to conditionally expand Scheme code.

But the feature is much more versatile, usable, less error prone and in a way more powerful than the C level alternative of a busload of #if s.

Therefore I'd vote for leaving things as they are. It was just me wondering when I noticed it first time. But it's good as it is.

mgorges commented 4 years ago

Gaining 30 sec every time you build for Android (or iOS which also makes multiple parallel architectures) is not 0, so there is some benefit to this. Also, I already have a patch, so at this point the additional development time needed is 0 and the source complexity is minimal; it separate a loop into two and updates some location variables. - That is if this is indeed the correct implementation. This shouldn't break anything for anyone, unless of course you do something to these files that is not part of the main branch.

0-8-15 commented 4 years ago

Am Sat, 26 Sep 2020 15:56:15 -0700 schrieb Matthias Görges notifications@github.com:

Gaining 30 sec every time you build for Android (or iOS which also makes multiple parallel architectures) is not 0, so there is some benefit to this. Also, I already have a patch, so at this point the additional development time needed is 0 and the source complexity is minimal; it separate a loop into two and updates some location variables. - That is if this is indeed the correct implementation. This shouldn't break anything for anyone, unless of course you do something to these files that is not part of the main branch.

I do. I use cond-expand on the target and macros to read files and include their values (e.g., read a file, convert into u8vector and spit out C source as content of a c-delcare).

Gaining a few second per compile would be worth nevertheless! I.e., I'd welcome it.

So since you already have a patch and don't waste your time - go ahead. The above trick is not used too frequently.

However if you had a good spot to add the ability to specify per-backend C files and tell the C compiler where to #include them - that would be very helpful to convert my code.

0-8-15 commented 4 years ago

Am Sat, 26 Sep 2020 15:56:15 -0700 schrieb Matthias Görges notifications@github.com: Gaining 30 sec every time you build for Android (or iOS which also makes multiple parallel architectures) is not 0, so there is some benefit to this. Also, I already have a patch, so at this point the additional development time needed is 0 and the source complexity is minimal; it separate a loop into two and updates some location variables. - That is if this is indeed the correct implementation. This shouldn't break anything for anyone, unless of course you do something to these files that is not part of the main branch. I do. I use cond-expand on the target and macros to read files and include their values (e.g., read a file, convert into u8vector and spit out C source as content of a c-delcare). Gaining a few second per compile would be worth nevertheless! I.e., I'd welcome it. So since you already have a patch and don't waste your time - go ahead. The above trick is not used too frequently. However if you had a good spot to add the ability to specify per-backend C files and tell the C compiler where to #include them - that would be very helpful to convert my code.

Having thought about this a little longer, I'd understood that I've been in err here.

We already have the ability to use cond-expand for a reason: we need to support multiple platforms. Given the volatility of Android we'll likely need much more. The trick used for in IF_ANDROIDAPI_GT_22 does not scale nor nest. While this is used for java until now... cond-expand and the full power of Scheme is still superior to the #if pre-processor mess of C. Plus I already have ideas in mind how to do better for Android/java code generation depending on the API version. (E.g., to get the EVENT_IDLE back for API while the app is inactive and the phone in doze mode.

Chance are that we will want to support these future enhancements and not break ssax, htmlprag, jScheme, debug and the gambit version detection in config all at the same time - at least.

IMHO that's worth those annoying 30 seconds.

mgorges commented 4 years ago

However if you had a good spot to add the ability to specify per-backend C files and tell the C compiler where to #include them - that would be very helpful to convert my code.

I don't have a straight-forward idea for this. It would require interpretation of the code to identify if a recompilation of the C piece is needed? At the moment it checked only for the age of the scm file to check if the c-piece needed to be regenerated and for the age of the c-piece to see if we need to recompile the object file?

mgorges commented 4 years ago

Having thought about this a little longer, I'd understood that I've been in err here.

So I should not merge this, but rather close the issue? Or do we want the separation of the loop into two to be included for future use?

0-8-15 commented 4 years ago

I contemplated about that question for another while. That those 30'' are still annoying (especially to people like me, who are using match a lot, which involves macro expansion and thus being bitten by longer gsc runs all the time). Chances are that I'm the big winner if that was merged.

The alternative solution would be to expand code producing inlined C code with a generated #ifdef hell. Tricky, taking developer time once, but saves seconds each time. But that's a rare case - so I'll ignore this.

Still caching the generated C files without regard to the options used when generating them will break things conceptually.

To summarize: I'd tend to rather close the issue right now for good. Out of principle caching them appears to me to be the wrong thing to do. Likely it will bite us later.

But it's late today. Maybe keep the changes around, just in case me sleeping a night over it lets me come up with a way to cache those anyway. (However unlikely this seems to me right now.)

Update: the real solution to the issue would be to collect all the relevant configure/setup/API/platform options into a single hash (as done in bitbake/yocto) and maintain a cache keyed on that hash. But that would be a major rewrite.

Hence again: close it for good!

2nd update: this is not saying: use bitbake -- I've been working with it. It's a hell in itself, since it gets thing wrong gotten right withing lambdanative.

mgorges commented 4 years ago

Update: the real solution to the issue would be to collect all the relevant configure/setup/API/platform options into a single hash (as done in bitbake/yocto) and maintain a cache keyed on that hash. But that would be a major rewrite.

That sounds like it would add even more redundant files than we already have? And it also means that we can't use a single SCM->C compilation step shared between platforms as they necessarily would have a different hash if platform is part of it.

0-8-15 commented 4 years ago

Am Mon, 28 Sep 2020 12:46:39 -0700 schrieb Matthias Görges notifications@github.com:

Update: the real solution to the issue would be to collect all the relevant configure/setup/API/platform options into a single hash (as done in bitbake/yocto) and maintain a cache keyed on that hash. But that would be a major rewrite.

That sounds like it would add even more redundant files than we already have? And it also means that we can't use a single SCM->C compilation step shared between platforms as they necessarily would have a different hash if platform is part of it.

Yes, that's what would happen. That's what happens with bitbake.

It has pro's and con's.

If you want to support multiple configurations or frequently build debug versions with additional code (double checks, trace logs, repls) and production versions without all the overhead and backdoors, then it is actually great, fast development cycles.

The con: disk space better be cheep.

Wrt. shared SCM->C compilation step: as much as I'd like to shorten some compiles - I simply can not see that this is going to do good. I foresee all too many scenarios where it comes into the way.

So if you where to add that, please make it at least optional.

mgorges commented 4 years ago

I like the idea of making it optional, but I am wondering if we should make it the default? E.g. you could declare TARGETED_C_PATH=yes if you want to opt out but everyone else would benefit from the compilation time improvement? Revised patch is here: https://gist.github.com/mgorges/037acf44b4ac580e6101fa54acabe2cf

0-8-15 commented 4 years ago

Am Tue, 29 Sep 2020 22:01:22 -0700 schrieb Matthias Görges notifications@github.com:

I like the idea of making it optional, but I am wondering if we should make it the default?

There might be competing arguments to consider.

In principle Scheme is a language which supports compile time code transformation like conditional expansion and macros.

It appears to be questionable to kill that feature for an unrelated benefit.
Don't consider the use cases you encountered so far too much. Chopping of a leg still allows you to get from A to B. But you'll never learn how much faster you could have been.
Do you know how many experienced Schemers just use lambdanative as it is? At least me recommended it publicly before.

A casual user unknowingly relying on that or Schemer trying to learn LN might be really disappointed when running into horribly strange bugs to only figure out that something under the hood deeply hidden in an "obscure build system" (as they'll likely feel at that point that LN is) has broken their carefully tested code.
Making it default will change things suddenly. I still suffer from the last one of those; see the API26 issue. Plus: it might break things.
Hence once included I'd recommend to better NOT make it default right now. Advertise to testers and wait for feedback before.

E.g. you could declare TARGETED_C_PATH=yes if you want to opt out but everyone else would benefit from the compilation time improvement? Revised patch is here: https://gist.github.com/mgorges/037acf44b4ac580e6101fa54acabe2cf

The problem with this solution would be that it is an all-or-nothing.

That's IMHO worse than nothing. It requires a decision incurring consequences impossible to forecast - reducing it to "I bet..."

The decision whether or not enable the shared C cache must be done per compilation unit (i.e, per module).

Instead of a SETUP variable, how about yet another one of those LN specific files in the modules directory?

mgorges commented 4 years ago

Well to flip this around, if I make it optional, nobody will use it, as they might either not know this is possible in the first place, or don't know what the benefit might be and as such are naturally going to avoid adding some risky experimental feature that nobody else does. Only if we put it in to be tested people, will complain and we will fix it.

We have no idea who uses the system and for what purpose, as there isn't much of a community that contributes to the system. To me a 45sec compilation time gain for things I recompile a lot when putting display in places - as LN doesn't support a proper debugger with stop points and inspection/changes, like other languages - is a pretty significant improvement?

Finally, with respect to system-wide or modular, again we have differing opinions. From my perspective, what you seem to be needing is an opt-out, not an opt-in. In the end, there are likely only very few modules, that don't want this, but randomly sprinkling in a new required file in every single module makes no sense to me.

0-8-15 commented 4 years ago

How about the simple most solution I can come up with

Well to flip this around, if I make it optional, nobody will use it, as they might either not know this is possible in the first place, or don't know what the benefit might be

Nag them with a message until they head for their SETUP and silence the warning. Either by declaring "stop shouting and use it as given" or "try out, learn about the benefit and make an informed choice".

and as such are naturally going to avoid adding some risky experimental feature that nobody else does. Only if we put it in to be tested people, will complain and we will fix it.

Yeah, or find it too annoying that thing break here and there and stop contributing.

Better nag.

We have no idea who uses the system and for what purpose, as there isn't much of a community that contributes to the system. To me a 45sec compilation time gain for things I recompile a lot when putting display in places - as LN doesn't support a proper debugger with stop points and inspection/changes, like other languages - is a pretty significant improvement?

As lambdanative is mostly using gambit people (like me) might have already tempered with the scripts and get these things done without having had the time to clean them up to the point they are worth a review.

While working the dog it occurred to me that I actually really do have a use case for more control over the cache.

While profiling I need to switch between a more or less heavily instrumented debug version and a "production" version.

Guess how often I'm not sure that I re-compiled one particular module or another.

Finally, with respect to system-wide or modular, again we have differing opinions. From my perspective, what you seem to be needing is an opt-out, not an opt-in. In the end, there are likely only very few modules, that don't want this, but randomly sprinkling in a new required file in every single module makes no sense to me.

Let's not have random sprinkling. That is, let's try I don't need random sprinkling overwrites and every other one does so too.

How about the simple most solution I came up with during the work?:

Have a override_compile_command.sh per module directory. Source it if it exists and skip the compile. Otherwise use default. (Still, better nag, that force people silently. They might not like the latter.) The override would be "garbage-in-garbage-out".

NB: This might also be the anchor to solve the extra make run from the other issue.

Something along the lines by this (untested) patch:

modified   languages/scm.sh
@@ -114,6 +114,12 @@ compile_payload_scm()
      fi
         fi
       fi
+    override_compile="$scm_path/"override_compile_command.sh
+    final_compile_command_scm="$SYS_GSC -:~~tgt=${SYS_PREFIX} -prelude \"$scm_opts\" -c -o $scm_ctgt $gsc_processing $scm_hdr $scm_src"
+    final_compile_command_c="$SYS_ENV $SYS_CC $payload_cdefs -c -o $scm_otgt $scm_ctgt -I$SYS_PREFIX/include -I$SYS_PREFIX/include/freetype2 -I$scm_path"
+    if [ -f ${override_compile} ]; then
+      source ${override_compile}
+    else
       if [ $scm_dirty = yes ]; then
         echo "    $scm_src .."
         scm_link_dirty=yes
@@ -121,13 +127,14 @@ compile_payload_scm()
    if [ -f $scm_hdr ]; then scm_hdr="-e '(load \"$scm_hdr\")'"; else scm_hdr=""; fi
    gsc_processing=""
    # if [ $SYS_VERBOSE ]; then gsc_processing="$gsc_processing -expansion"; fi
-        veval "$SYS_GSC -:~~tgt=${SYS_PREFIX} -prelude \"$scm_opts\" -c -o $scm_ctgt $gsc_processing $scm_hdr $scm_src"
+        veval "${final_compile_command_scm}"
         if [ $veval_result != "0" ]; then rmifexists "$scm_ctgt"; fi
         assertfile "$scm_ctgt"
         rmifexists "$scm_otgt"
-        veval "$SYS_ENV $SYS_CC $payload_cdefs -c -o $scm_otgt $scm_ctgt -I$SYS_PREFIX/include -I$SYS_PREFIX/include/freetype2 -I$scm_path"
+        veval "${final_compile_command_c}"
         assertfile "$scm_otgt"
       fi
+    fi
       scm_csrcs="$scm_csrcs $scm_ctgt"
       payload_objs="$payload_objs $scm_otgt"
     done

0-8-15 commented 4 years ago

Not that this is far from enough - the $dirty would likely be off.

Just to explain the idea.

0-8-15 commented 4 years ago

Having though about this a little longer let me suggest to NOT make this the default until we have a solution for #339, #338 and #330 being merged.

Having those solved before might inform us about potential shortcomings we might miss.

I believe #330 should be the most benefitial for the general audience, your team and whom else.

For me: I'm mostly held back by #339 as that's what broke my deadlines.

Update: with the current state of affairs I could not even merge and test it. Right now there are just too many trace/test/experimental modifications in the build process here.

0-8-15 commented 4 years ago

More remarks. Let's hope I get that short. (And please forgive me if that goes against politeness.)

Schemers are rare. Whoever looks closer into LN is likely interested in a certain use case or otherwise not immediately put of by the hell of whitespace written as "(" or ")".
LN is essentially depending on gambit. A newcomer having -- either just learned about Scheme (because of the "use case" above) or prior knowledge of Scheme thus looking closer -- will likely try things out and make their way from that point if everything works well.

Supporting the expectation the Scheme in the variant as documented by Gambit actually works is IMHO a reasonable assumption.

Breaking that expectation, especially in a rather random, even for experts hard to debug way will put both of them off.
The assumption that users don't use Scheme with the power of gambit and rather just modified a few lines to the published examples would be rather naive.
So far the whole thread started out with my casual remark that under certain circumstances, which where accidentally met by a good deal of lambdanative code, could be possible.

Right in the same context I added that this saving comes at a huge risk: it nails the current flexibility of LN down to those features already used up to today. Since that I' trying to say that a modification of the caching might be beneficial.

But that should not restrict. It should enable. Caches can be scrubbed.

Update: Wrt: Caches can be scrubbed. -- if that's not too frequent and under the users control, that is. ;-)