rtoy / maxima

A Clone of Maxima's repo
Other
0 stars 0 forks source link

Simultaneous Compilation in parallel maxima processes fails #798

Open rtoy opened 3 months ago

rtoy commented 3 months ago

Imported from SourceForge on 2024-07-03 11:29:15 Created by peterpall on 2021-01-06 10:31:11 Original: https://sourceforge.net/p/maxima/bugs/3699


Debugging this one has required loads of time as this was a real heisenbug, that triggered reliably on the CI, but only if I didn't ask it to provide much debug information.

One symptom was:

21:24:54: Debug: Failed to shown notification: Failed to execute child process “dbus-launch” (No such file or directory)
21:24:54: Failed to shown notification: Failed to execute child process “dbus-launch” (No such file or directory)

Your C compiler failed to compile the intermediate file.
loadfile: failed to load /usr/share/maxima/5.43.2/share/draw/draw.lisp
 -- an error. To debug this try: debugmode(true);

another symptom, that happened only after looking at what feels like a 100 CI runs:

;; Note: Tail-recursive call of BIPART was replaced by iteration.
;; Note: Tail-recursive call of BIPART was replaced by iteration.Message from maxima's stderr stream: /home/runner/work/wxmaxima/wxmaxima/build/.maxima/binary/5_43_2/gcl/GCL_2_6_12/share/draw/gnuplot.c:14028:1: warning: null character(s) ignored
14028 | object V1853;object V1854;
      | ^
/home/runner/work/wxmaxima/wxmaxima/build/.maxima/binary/5_43_2/gcl/GCL_2_6_12/share/draw/gnuplot.c: In function ‘LI61’:
/home/runner/work/wxmaxima/wxmaxima/build/.maxima/binary/5_43_2/gcl/GCL_2_6_12/share/draw/gnuplot.c:14028: error: expected declaration specifiers before ‘)’ token
14028 | object V1853;object V1854;
      | 
/home/runner/work/wxmaxima/wxmaxima/build/.maxima/binary/5_43_2/gcl/GCL_2_6_12/share/draw/gnuplot.c:14029:2: error: expected declaration specifiers before ‘vs_top’
14029 | {  VMB70 VMS70 VMV70
      |  ^~~~~~
/home/runner/work/wxmaxima/wxmaxima/build/.maxima/binary/5_43_2/gcl/GCL_2_6_12/share/draw/gnuplot.c:14030:2: error: expected declaration specifiers before ‘if’
14030 |  goto TTL;
      |  ^~
/home/runner/work/wxmaxima/wxmaxima/build/.maxima/binary/5_43_2/gcl/GCL_2_6_12/share/draw/gnuplot.c:14033:2: error: expected declaration specifiers before ‘V1903’
14033 |  goto T5158;
      |  ^~~~~
/home/runner/work/wxmaxima/wxmaxima/build/.maxima/binary/5_43_2/gcl/GCL_2_6_12/share/draw/gnuplot.c:14034:2: error: expected declaration specifiers before ‘goto’
14034 |  }
      |  ^   
/home/runner/work/wxmaxima/wxmaxima/build/.maxima/binary/5_43_2/gcl/GCL_2_6_12/share/draw/gnuplot.c:14036:2: error: expected declaration specifiers before ‘goto’
14036 |  goto T5158;
      |  ^~~~

The cause was: Two maxima processes were trying to compile draw at the same time - which caused the lisp to re-generate C source files while a c compiler was starting up causing a compilation failure that resulted in a non-loadable file in maxima's binary folder.

Three remedies I can think of:

  1. Add a locking mechanism for maxima's load- and compile-type commands.
  2. make sure that during compilation all filenames include maxima's pid and therefore are unique. After compilation the result can be moved to the final destination in an atomic operation which means that the binary is consistent if it is present. In order to support certain virus scanners on MS windows after the move operation has claimed to be successful we have to manually check if that is the case and if not repeat it: If a virus scanner still scans a file on MS Windows it cannot be moved but as it this neither is a permission problem nor a non-existing source or destination the move command cannot report this.
  3. close this ticket with a "fixme: is a problem of the lisp compiler": I am not sure if it actually is.
rtoy commented 3 months ago

Imported from SourceForge on 2024-07-03 11:29:16 Created by l_butler on 2021-01-06 12:42:52 Original: https://sourceforge.net/p/maxima/bugs/3699/#4ec1


Here are a few random thoughts:

  1. Wouldn't it make more sense to pre-compile all the packages that need it? To avoid duplication of effort?

  2. I wonder if this bug report should be filed with GCL? Based on your description, it seems like this collision can/will happen in parallel compilation with GCL (e.g. with Axiom).

  3. Do you know why this does not affect other maxima+lisp combinations?

  4. Use of pids to name files uniquely is decidedly inferior to mktemp and relatives.

Leo

rtoy commented 3 months ago

Imported from SourceForge on 2024-07-03 11:29:20 Created by peterpall on 2021-01-06 13:00:37 Original: https://sourceforge.net/p/maxima/bugs/3699/#d629


  1. or moving over draw from share into maxima's src directory?
  2. I didn't test this with ecl and sbcl. But you might be entirely right in this point.
  3. see 2.
  4. Good idea.
rtoy commented 3 months ago

Imported from SourceForge on 2024-07-03 11:29:23 Created by kjak on 2021-01-06 14:05:15 Original: https://sourceforge.net/p/maxima/bugs/3699/#ea20


Is [#3698] the same as this one or is it different? At first glance it looks like you made two essentially identical tickets.

If they're different then I think you should clearly explain the difference. If they're the same then we should close [#3698] since there is some activity here.

rtoy commented 3 months ago

Imported from SourceForge on 2024-07-03 11:29:27 Created by peterpall on 2021-01-06 15:02:06 Original: https://sourceforge.net/p/maxima/bugs/3699/#ea20/f523


I tried to report this bug twice since Sourceforge told me that I had triggered an internal error and when I tried to look at the bug list the bug I tried to report didn't appear (perhaps didn't appear yet). Now Bug #3698 reliably crashes my firefox => if you are able to close #3698 I would beg you to do so for me.

rtoy commented 3 months ago

Imported from SourceForge on 2024-07-03 11:29:30 Created by kjak on 2021-01-06 15:13:36 Original: https://sourceforge.net/p/maxima/bugs/3699/#ea20/f523/b612


I've closed [#3698]. That report is huge and that may be the cause of the problems you've seen.

rtoy commented 3 months ago

Imported from SourceForge on 2024-07-03 11:29:34 Created by l_butler on 2021-01-06 16:43:45 Original: https://sourceforge.net/p/maxima/bugs/3699/#d629/59e6


  1. or moving over draw from share into maxima's src directory?

I don't think that will necessarily fix the problem you see. For example, lapack (and others) are compiled on first loading.

It may make more sense to create a custom image of maxima that already contains all the packages you want to test, before you start the testing.

That may be independently desirable: being able to configure the build process to create a "core" image or some "maximal" image might be useful. STACK, for example, offers some options to use an "optimized" maxima image, but as far as I know there is no testing of that image.

  1. I didn't test this with ecl and sbcl. But you might be entirely right in this point.

Yes, my guess is that the collisions will happen with all lisps that compile lisp code, as long as you are running the same lisp in parallel jobs.

Leo

rtoy commented 3 months ago

Imported from SourceForge on 2024-07-03 11:29:37 Created by dauti on 2021-01-06 17:24:54 Original: https://sourceforge.net/p/maxima/bugs/3699/#4ec1/59f3


If every Lisp file is compiled for a complete Maxima compilation, Maxima would not compile e.g. on SBCL 32 Bit (don't know, if other Lisp's are affected too (maybe CMUCL, SBCL is a fork of it)).

The compilation of Lapack requires more memory, so that package can not be loaded there.

rtoy commented 3 months ago

Imported from SourceForge on 2024-07-03 11:29:41 Created by peterpall on 2021-01-06 18:04:03 Original: https://sourceforge.net/p/maxima/bugs/3699/#ea20/f523/b612/6982


Thanks a lot!

rtoy commented 3 months ago

Imported from SourceForge on 2024-07-03 11:29:45 Created by peterpall on 2021-01-06 18:12:24 Original: https://sourceforge.net/p/maxima/bugs/3699/#d629/59e6/dcc4


Working around the problem isn't too hard once one has found its reason. Why I finally reported was a different aspect of this problem, though: If an user ever triggers this problem the remains of the unsuccessful compilation will linger around in the user's binary folder. That means that the user will never again be able to load this package - except after the binary folder is manually deleted or invalidated by a maxima update or a lisp change. A race condition that isn't easy to trigger, but that permanently breaks a package for one user isn't nice.

Fortunately if sbcl runs out of memory during compilation of a package there won't be remains from the compilation attempt. Or at least I never have seem that there are.

rtoy commented 3 months ago

Imported from SourceForge on 2024-07-03 11:29:48 Created by peterpall on 2021-01-06 21:20:11 Original: https://sourceforge.net/p/maxima/bugs/3699/#d629/59e6/dcc4/1d76


On the other hand, and as a 2nd thought the fact that I got an invalid object file with gcl while with sbcl, in all cases I had compilation failures I got no object file, at all, might possibly be caused by a gcl bug... ...I haven't enough data about the failures to be able to make a statistics, though...