quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.73k stars 2.67k forks source link

mongodb devservices tests failing accessing docker binary on M1 runner #28779

Open holly-cummins opened 1 year ago

holly-cummins commented 1 year ago

Describe the bug

The headline symptom is that all the tests in the mongodb-devservices suite fail, only on the Mac M1 runner.

The headline failure is

[Check failure on line 45 in integration-tests/mongodb-devservices/src/test/java/io/quarkus/it/mongodb/BookResourceTest.java](https://github.com/quarkusio/quarkus/commit/9bf92f7149b08bb3ab313d615ca35e75cb4274f5#annotation_5275206252)

JVM Tests - JDK 17 MacOS M1

java.net.SocketTimeoutException: Read timed out
    at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:283)
    at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:309)
    at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350)
    at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803)
    at java.base/java.net.Socket$SocketInputStream.read(Socket.java:966)

The root cause is this:

2022-10-21T04:12:53.1380870Z 2022-10-21 00:12:53,097 WARN  [io.qua.dep.IsDockerWorking] (build-23) No docker binary found or general error: java.lang.RuntimeException: Input/Output error while executing command.```

It's the exact same error you get if you don't have podman on your machine and try to run the docker tests. But it's only affecting one test, so it's obviously not a machine-wide issue. I wonder if it's some sort of cwd issue, or a fork of the shell that means it can't find the binary.

This is the line of code which is failing. Only for the mongodb dev services tests, even though the code is in util and couldn't be more common:

           try {
                if (!ExecUtil.execSilentWithTimeout(Duration.ofMillis(DOCKER_CMD_CHECK_TIMEOUT), binary, "-v")) {
                    LOGGER.warnf("'%s -v' returned an error code. Make sure your Docker binary is correct", binary);
                    return Result.UNKNOWN;
                }
            } catch (Exception e) {
                LOGGER.warnf("No %s binary found or general error: %s", binary, e);
                return Result.UNKNOWN;
            }

Expected behavior

No response

Actual behavior

No response

How to Reproduce?

Not reproducible locally. Appears on M1 CI/CD builds.

Output of uname -a or ver

No response

Output of java -version

No response

GraalVM version (if different from Java)

No response

Quarkus version or git rev

No response

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

quarkus-bot[bot] commented 1 year ago

/cc @evanchooly, @gastaldi, @geoand, @loicmathieu, @stuartwdouglas

holly-cummins commented 1 year ago

See also https://quarkusio.zulipchat.com/#narrow/stream/187038-dev/topic/M1.20.2B.20MongoDB/near/305314594

geoand commented 1 year ago

I personally have no idea why only the mongo tests would be affected by this...

holly-cummins commented 1 year ago

Some sort of shell-forking behaviour is my best theory, but I can't find any evidence of it in the build scripts. It's just so weird.

This is a relatively new failure. I looked through the code for recent changes and there wasn't anything in an area which seemed relevant. That's consistent with it not reproducing on my own mac, and suggests it's a machine issue ... but it leaves me none the wiser about what's actually going on.

holly-cummins commented 1 year ago

Thinking about it, I will disable the tests, for two reasons. I'm normally reluctant to disable M1 tests just because they fail, since that kind of defeats the point of having tests to tell you things don't work. But it will reduce noise in the builds in the short term, while we try and diagnose.

It will also be a useful diagnostic tool. If the failure moves to the next suite along, we know it's something in an earlier suite that messes up the env. If everything is green, we know it's something in this suite which is off.

geoand commented 1 year ago

👍🏼

holly-cummins commented 1 year ago

And the verdict from that build is that it's green on M1.

The runner seems to be having some annoying disk space issues with purgeable files not being cleared out, so if the mongodb artifacts are especially big, that maybe could account for some i/o strangeness ... but it's still odd it would fail just for the one test, and then recover.

holly-cummins commented 1 year ago

See also https://github.com/quarkusio/quarkus/pull/29094#issuecomment-1306871569, a similar failure, but with a testcontainers stack trace.