Open olafurpg opened 4 years ago
The issue looks related to native-image, I am unable to reproduce with the JVM version of bloopgun
$ coursier launch ch.epfl.scala:bloopgun_2.12:1.4.0-RC1 --main-class bloop.bloopgun.Bloopgun -- exit
``
Thanks for reporting! I'm not sure why this might be happening. I would expect bloopgun to be able to communicate with the server even if bloop exit
fails.
I've just tried to reproduce this issue and couldn't. I'm using GraalVM 19.2.1 (/Library/Java/JavaVirtualMachines/graalvm-ce-19.2.1/Contents/Home/jre
).
I'm unable to reproduce this with a single bloop exit
command but I am still able to consistently reproduce this when running the same failing test. The problem seems unrelated to native-image, since I can reproduce it with both a standalone JVM bloopgun binary as well as bloop.py
.
❯ bloop help # starts new server
❯ bloop test <redacted>
...
===============================================
Total duration: 0ms
1 failed
Failed:
- ...
* null - java.lang.ExceptionInInitializerError: null
===============================================
After the test fails I'm no longer able to run any bloop command
❯ bloop help
error: Unexpected error forces client exit!
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at bloop.shaded.snailgun.protocol.Protocol.$anonfun$processChunkFromServer$1(Protocol.scala:181)
at scala.util.Try$.apply(Try.scala:213)
at bloop.shaded.snailgun.protocol.Protocol.processChunkFromServer(Protocol.scala:180)
at bloop.shaded.snailgun.protocol.Protocol.sendCommand(Protocol.scala:108)
at bloop.shaded.snailgun.TcpClient.run(TcpClient.scala:34)
at bloop.bloopgun.BloopgunCli.executeCmd$1(Bloopgun.scala:255)
at bloop.bloopgun.BloopgunCli.fireCommand(Bloopgun.scala:261)
at bloop.bloopgun.BloopgunCli.run(Bloopgun.scala:216)
at bloop.bloopgun.Bloopgun$.main(Bloopgun.scala:560)
at bloop.bloopgun.Bloopgun.main(Bloopgun.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at coursier.bootstrap.launcher.a.a(Unknown Source)
at coursier.bootstrap.launcher.ResourcesLauncher.main(Unknown Source)
I'm only able to stop the Bloop server with the following command
❯ kill $(jps -l | grep bloop.Server | awk '{ print $1 }')
When I reproduce with bloop.py
I get a different error message:
❯ ~/dev/metals/metals/src/main/resources/bloop.py test ...
...
Failed:
- ...
* null - java.lang.ExceptionInInitializerError: null
===============================================
❯ ~/dev/metals/metals/src/main/resources/bloop.py help
Server unexpectedly disconnected in recv_into()%
Adding --verbose to the test
command doesn't reveal any interesting information from what I can see
[D] Runner for com.geirsson.junit.PantsFramework produced 1 initial tasks for 1 tests.
[D] Running TaskDef(<redacted>, sbt.ForkMain$SubclassFingerscan@3f7bdc73, false, [SuiteSelector])
2020-01-27 15:43:47,976 INFO c.t.u.l.Slf4jBridgeUtility$ org.slf4j.bridge.SLF4JBridgeHandler installed.
...
==> X null 0.0s java.lang.ExceptionInInitializerError: null
Caused by: java.lang.NullPointerException
[D] Produced 0 nested tasks and 1 events.
Execution took 0ms
1 tests, 1 failed
[D] Test server has been successfully closed.
[D] The process is closed. Emptying buffer...
[D] Forked process exited with code: 0
[D] Process 71263 exited with code: 0
[D] Cleaning up temporary MANIFEST jar: /var/folders/y5/gbz6gd297z57_5bg0nm589nr0000gp/T/jvm-forker-manifest7835550152620902158.jar
===============================================
Total duration: 0ms
1 failed
Failed:
- <redacted>
* null - java.lang.ExceptionInInitializerError: null
===============================================
[D] Test suites failed: true
[D] Elapsed: 11725.022629 ms
I'm able to reproduce with 1.4.0-RC1 as well as 1.4.0-RC1+16-1cd95092
I may have tracked down the root cause by running bloop console PROJECT_NAME
bloop console <redacted>
Loading...
Compiling (synthetic)/ammonite/predef/interpBridge.sc
scala.reflect.internal.FatalError: Error accessing /Users/lgeirsson/workspace/source/.pants.d/export-dep-as-jar/export-dep-as-jar/252d64521cf9/tmpn0s2b0vm.jar
scala.tools.nsc.classpath.AggregateClassPath.$anonfun$list$3(AggregateClassPath.scala:99)
This Bloop JSON file contains jars on the classpath, which don't exist. This JSON file was created with an older version of the Pants exporter. I am unable to reproduce when using the latest version of the Pants exporter (which copies over jars and uses an immutable classpath).
I believe Bloop should still be able to gracefully handle this situation, maybe by reporting a warning under --verbose when the classpath contains non-existing entries.
A minimized reproduction can be found here https://github.com/olafurpg/intellij-bsp-pants/issues/11
Following up from an offline investigation, @jvican discovered that the Pants test runner is setting System.err
which (to my understanding) caused an exception in Nailgun that shuts down the IO scheduler in Bloop causing the unresponsiveness. I can confirm that I'm unable to reproduce the error by changing the Pants test runner to only set System.err
during test execution and setting System.err
back to the old value in a finally block.
Thanks for the follow-up and the reproduction, I'm happy we tracked this issue down!
Given that System.setErr
could be triggered by classloading the test jars, I believe we should add some logic to bloop to reset back streams right after classloading is done. Nailgun needs the streams to be instances of their concrete thread-local stream classes so that it can redirect in, out and err directly to the socket streams of every client, even when several clients run at the same time.
The implementation of the fix on the bloop side should account for the fact that test classloading can happen concurrently when parallel execution is enabled.
Using a native-image of bloopgun, I'm unable to run
bloop exit
After running this command once, I'm unable to communicate with the Bloop server with other commands