thheller / shadow-cljs

ClojureScript compilation made easy
https://github.com/thheller/shadow-cljs
Eclipse Public License 1.0
2.25k stars 177 forks source link

VM crash when watching a project #1015

Closed Kineolyan closed 3 months ago

Kineolyan commented 2 years ago

On a personal project [1] building a simple website, I am having VM crashes (with hs_err files) with almost every watch command. I tested various configured VMs, but it resulted in the same crashes. Tested:

Trying to reproduce it, I saw it work once or twice. But as soon as it starts crashing, I cannot find ways to restore a working build, except from whipping the project and re-cloning it from scratch (a process a bit cumbersome). One of the hs_err file here [2]

I am also reproducing this issue in a docker container. These are the steps:

# with podman ...
podman run -it --rm docker.io/cimg/openjdk:17.0-node /bin/bash
# ... or with docker
docke run -it --rm cimg/openjdk:17.0-node /bin/bash

# then for the real test
git clone https://gitlab.com/kineolyan/reading-buffer
yarn install
npx shadow-cljs watch app

This worked successfully in the past. I remember starting seeing crashes months ago but restarting was enough to work this around. I am ok to investigate it but I really don't know where to start. Thanks for your assistance

[1] https://gitlab.com/kineolyan/reading-buffer.git [2] https://drive.google.com/file/d/1Ph02YBAiD7aqXaOKF-4C384IhKAjpGUl/view?usp=sharing


In addition, let's note the following points:

I tried with the latest version of shadow-cljs, but no luck.

1 warning about graal features, when not using a GraalVM build:

[To redirect Truffle log output to a file use one of the following options:
* '--log.file=<path>' if the option is passed using a guest language launcher.
* '-Dpolyglot.log.file=<path>' if the option is passed using the host Java launcher.
* Configure logging using the polyglot embedding API.]
[engine] WARNING: The polyglot context is using an implementation that does not support runtime compilation.
The guest application code will therefore be executed in interpreted mode only.
Execution only in interpreted mode will strongly impact the guest application performance.
For more information on using GraalVM see https://www.graalvm.org/java/quickstart/.
To disable this warning the '--engine.WarnInterpreterOnly=false' option or use the '-Dpolyglot.engine.WarnInterpreterOnly=false' system property.

Another warning from SLF4J:

[:app] Compiling ...
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
thheller commented 2 years ago

Hmm I don't have any guesses for you. Segmentation faults are way beyond my expertise and can be caused by pretty much everything in a System (eg. faulty RAM).

Does this happen when you run it without the container?

The graal errors should be gone in the latest shadow-cljs version. They are irrelevant though. The SLF4J warnings you can ignore, they are caused by some libraries you use, not by shadow-cljs.

Kineolyan commented 2 years ago

Yes, it happens inside and outside the container. I mainly work without a container and I tested it for the sake of completeness, to be sure that it was not only my machine. Would you have any document explaining how shadow-cljs works? I can always try to look at it on my own. Do you have any recommended version of the JVM to use with shadow-cljs or is it supposed to work whatever the version?

thheller commented 2 years ago

Only guess I have is related to this one https://github.com/google/closure-compiler/issues/3945.

You appear to be using highlight.js via 10x, so maybe on your particular OS/JVM combo it just segfaults instead of throwing a StackOverflow. Maybe try upgrading 10x to [day8.re-frame/re-frame-10x "1.2.8"], I believe that addresses the highlight.js issue.

As for JVM anything from Java8 is supported. I don't test anything in particular but given the JVM stability I'm fairly confident they all work. Newer is of course prefered. I personally use Java17, which I would recommend given its the latest LTS release.

Kineolyan commented 2 years ago

Many thanks @thheller for https://github.com/google/closure-compiler/issues/3945. I saw that kind of error when running with JDK8 instead of JDK17. I will try to upgrade, as you suggested.

I am also fairly confident about the JVM. Thanks anyway for sharing your config.

Kineolyan commented 2 years ago

I had a recent branch where re-frame-10x was already update to 1.2.8 with the same effect. However, increasing the JVM heap as suggested in this comment of google/closure-compiler#3945 made it work. For the record, for future readers, I suggest setting the JVM options via JAVA_TOOL_OPTIONS="-Xms.. -Xmx..". It avoids looking for the way shadow-cljs launches the JVM :smile:

As far as I am concerned, as closure-compiler closed the ticket, considering that only an external PR would improve the current status, the solution is to increase the stack size or heap size. @thheller Feel free close this issue if you want. If you want me to write something about this user-guide troubleshot or elsewhere, you can ask me.

In any case, thanks again for your assistance :+1:

thheller commented 2 years ago

Leaving this open until I can figure out how to handle this properly.

Currently the Closure Compiler is used in many places, not just for optimizations. So all these places need to execute in a thread with a huge stack size rather than the default. This would be easy enough to do for the watch and :parellel-build threads but the main task is currently just reusing the current thread of the caller, which may be embedded.

This could also end up drastically increasing the memory use if Threads suddenly use X more memory. Technically only the invocations to the Closure Compiler itself need this stack size so could maybe use a threadpool for this and hand off all calls that way while leaving all other threads the same smaller stack.

I believe with the highlight.js issue being removed by default that addresses the most common cause. Leaving this as is until it appears in another npm package.

Kineolyan commented 2 years ago

If you want some assistance, I would be tempted to look at it (maybe with guidance). However, looking at the repo, I haven't found anything explaining how shadow-cljs was working, nor where to start. Do such document exist? If not, I may one day dive into it, out of curiosity, though I am currently also looking at other projects

thheller commented 2 years ago

This is currently at the lowest possible priority given that the current only cause I know of is a file in highlight.js that 99.9% of people probably don't need anyways. If I ever make a pass over the affected code I may make some adjustments.

You can check and confirm that bumping the re-highlight dependency to [superstructor/re-highlight "2.0.1"] fixes things. Might also be covered by just bumping re-frame-10x, not sure. The point is not including the unnecessary file in the first place which the new re-highlight release covers.