testcontainers / testcontainers-java

Testcontainers is a Java library that supports JUnit tests, providing lightweight, throwaway instances of common databases, Selenium web browsers, or anything else that can run in a Docker container.
https://testcontainers.org
MIT License
8.01k stars 1.65k forks source link

"No such image: testcontainers/ryuk:0.3.0" #3574

Closed gesellix closed 3 years ago

gesellix commented 3 years ago

Testcontainers 1.15.0 on Docker Engine 20.10/Docker for Mac 2.5.4 fails with the following stacktrace:

org.testcontainers.containers.ContainerLaunchException: Container startup failed

    at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:327)
    at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:308)
    at org.testcontainers.spock.TestcontainersMethodInterceptor.startContainers_closure3(TestcontainersMethodInterceptor.groovy:83)
    at groovy.lang.Closure.call(Closure.java:405)
    at groovy.lang.Closure.call(Closure.java:421)
    at org.testcontainers.spock.TestcontainersMethodInterceptor.startContainers(TestcontainersMethodInterceptor.groovy:80)
    at org.testcontainers.spock.TestcontainersMethodInterceptor.interceptSetupSpecMethod(TestcontainersMethodInterceptor.groovy:25)
    at org.spockframework.runtime.extension.AbstractMethodInterceptor.intercept(AbstractMethodInterceptor.java:36)
    at org.spockframework.runtime.extension.MethodInvocation.proceed(MethodInvocation.java:97)
    at org.spockframework.spring.SpringInterceptor.interceptSetupSpecMethod(SpringInterceptor.java:37)
    at org.spockframework.runtime.extension.AbstractMethodInterceptor.intercept(AbstractMethodInterceptor.java:36)
    at org.spockframework.runtime.extension.MethodInvocation.proceed(MethodInvocation.java:97)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
    at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
    at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220)
    at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53)
Caused by: org.testcontainers.containers.ContainerFetchException: Can't get Docker image: RemoteDockerImage(imageName=couchdb:1.7.2, imagePullPolicy=DefaultPullPolicy())
    at org.testcontainers.containers.GenericContainer.getDockerImageName(GenericContainer.java:1278)
    at org.testcontainers.containers.GenericContainer.logger(GenericContainer.java:612)
    at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:317)
    ... 16 more
Caused by: com.github.dockerjava.api.exception.NotFoundException: Status 404: {"message":"No such image: testcontainers/ryuk:0.3.0"}

    at org.testcontainers.shaded.com.github.dockerjava.core.DefaultInvocationBuilder.execute(DefaultInvocationBuilder.java:241)
    at org.testcontainers.shaded.com.github.dockerjava.core.DefaultInvocationBuilder.post(DefaultInvocationBuilder.java:125)
    at org.testcontainers.shaded.com.github.dockerjava.core.exec.CreateContainerCmdExec.execute(CreateContainerCmdExec.java:33)
    at org.testcontainers.shaded.com.github.dockerjava.core.exec.CreateContainerCmdExec.execute(CreateContainerCmdExec.java:13)
    at org.testcontainers.shaded.com.github.dockerjava.core.exec.AbstrSyncDockerCmdExec.exec(AbstrSyncDockerCmdExec.java:21)
    at org.testcontainers.shaded.com.github.dockerjava.core.command.AbstrDockerCmd.exec(AbstrDockerCmd.java:35)
    at org.testcontainers.shaded.com.github.dockerjava.core.command.CreateContainerCmdImpl.exec(CreateContainerCmdImpl.java:595)
    at org.testcontainers.utility.ResourceReaper.start(ResourceReaper.java:91)
    at org.testcontainers.DockerClientFactory.client(DockerClientFactory.java:203)
    at org.testcontainers.LazyDockerClient.getDockerClient(LazyDockerClient.java:14)
    at org.testcontainers.LazyDockerClient.listImagesCmd(LazyDockerClient.java:12)
    at org.testcontainers.images.LocalImagesCache.maybeInitCache(LocalImagesCache.java:68)
    at org.testcontainers.images.LocalImagesCache.get(LocalImagesCache.java:32)
    at org.testcontainers.images.AbstractImagePullPolicy.shouldPull(AbstractImagePullPolicy.java:18)
    at org.testcontainers.images.RemoteDockerImage.resolve(RemoteDockerImage.java:66)
    at org.testcontainers.images.RemoteDockerImage.resolve(RemoteDockerImage.java:27)
    at org.testcontainers.utility.LazyFuture.getResolvedValue(LazyFuture.java:17)
    at org.testcontainers.utility.LazyFuture.get(LazyFuture.java:39)
    at org.testcontainers.containers.GenericContainer.getDockerImageName(GenericContainer.java:1276)
    ... 18 more
bsideup commented 3 years ago

@gesellix do you have a mirror configured? Or is this a response from Docker Hub?

gesellix commented 3 years ago

I suppose that the 404 is the local Docker Engine and no lookup to any registry is made. In this case I don't use any private registry, the engine is directly connecting to the Docker Hub. Manual testcontainers/ryuk:0.3.0 fixes the issue.

gesellix commented 3 years ago

As far as I know the docker cli had some logic like this for docker run (pseudo code):

try {
  create _container(image)
}catch(e){
  if e.status == 404 {
    pull_image(image)
    create _container(image)
  }
}

I don't understand, though, why we now run into such an issue. The only thing I'm aware of is https://github.com/docker/cli/pull/1498, which might be related.

gesellix commented 3 years ago

Did/does Testcontainers pull images, in this case ryuk, before trying to create containers?

keeganwitt commented 3 years ago

I'm hitting this today too. In my case, for a Postgres container. I tried setting ryuk.testcontainer.image=testcontainersofficial/ryuk:0.3.0. It couldn't pull that image either. They definitely exist in Docker Hub though.

The other thing I suspected was maybe we're hitting the Docker Hub pull limits? I thought since this is communicating with the daemon it should use the auth configured on the host, but possibly I'm misunderstanding.

gesellix commented 3 years ago

... so maybe the pull for ryuk is performed unauthenticated?

keeganwitt commented 3 years ago

That's what I was worried about, yea. But I'm not sure if that's what's going on or not yet.

bsideup commented 3 years ago

@gesellix we definitely pull the image if it is not available.

The other thing I suspected was maybe we're hitting the Docker Hub pull limits?

In that case, an error would differ (unless the Docker Hub team have decided that 404 is a perfect http status code for a rate limited response, which should be 5xx instead)

keeganwitt commented 3 years ago

When we got the error before, from running docker commands directly in a job, we got a message that specifically said we hit pull limits. But I don't know the HTTP status that the docker binary received in that case, so I wasn't sure if the message was possibly being hidden by TestContainers or not.

gesellix commented 3 years ago

While our GitHub Actions still work (same Testcontainers version, but differend Docker Engine/operating system), I guess this is mainly related to Docker for Mac. I can give it a try with an older Docker4Mac release tomorrow.

keeganwitt commented 3 years ago

In my case, it's passing locally on Mac with latest Docker for Mac stable (but I have those images in my cache though) and failing on GitLab.

bsideup commented 3 years ago

404 and {"message":"No such image: testcontainers/ryuk:0.3.0"} is what we actually get from the API.

Also, testcontainers/* images are exempt from rate limiting, or at least that's what they told us :)

bsideup commented 3 years ago

oh wait, I think I know what is it...

keeganwitt commented 3 years ago

Well, that rules out that possibility then, at least. Maybe Docker Hub is having some problem? I just tried disabling ryuk and then it said 404 with No such image: alpine:3.5

gesellix commented 3 years ago

For me Docker Hub seemed to be ok, docker pull *ryuk made it work for me... well... maybe the other images have been in the local cache 🤔

keeganwitt commented 3 years ago

Good point. docker pull didn't break for me either locally.

gesellix commented 3 years ago

https://status.docker.com/pages/533c6539221ae15e3f000031 looks ok

bsideup commented 3 years ago

Ok, "filter by image name" query parameter in /images/json got removed, and now this condition fails: https://github.com/testcontainers/testcontainers-java/blob/d135a2605401f6c663aab4e7edc6d6d76716f930/core/src/main/java/org/testcontainers/DockerClientFactory.java#L330

I just submitted #3575 with a fix, will be included in 1.15.1

keeganwitt commented 3 years ago

I suppose I should mention too that I had tried with 1.15.0-rc2 and 1.15.0.

keeganwitt commented 3 years ago

Ok, "filter by image name" query parameter in /images/json got removed, and now this condition fails:

https://github.com/testcontainers/testcontainers-java/blob/d135a2605401f6c663aab4e7edc6d6d76716f930/core/src/main/java/org/testcontainers/DockerClientFactory.java#L330

I just submitted #3575 with a fix, will be included in 1.15.1

Ah, so a change in Docker Hub API?

I deleted ryuk image locally and ran test again, oddly passed again.

bsideup commented 3 years ago

@keeganwitt Docker's API. Although the query param was deprecated (I wish we could run Docker in a strict API mode - will explore)

Sorry for this. We will release a hotfix ASAP. Meanwhile, consider pre-pulling testcontainers/ryuk:0.3.0 and alpine:3.5 :(

keeganwitt commented 3 years ago

@keeganwitt Docker's API. Although the query param was deprecated (I wish we could run Docker in a strict API mode - will explore)

Sorry for this. We will release a hotfix ASAP. Meanwhile, consider pre-pulling testcontainers/ryuk:0.3.0 and alpine:3.5 :(

Yea, I'd thought of that, but I'm not sure it's possible with GitLab's Docker Executor. It should be possible to run it as a script with shell executor instead though I suppose. I'm still confused why it worked locally after deleting the ryuk image though... Maybe different Docker daemon versions?

gesellix commented 3 years ago

Thanks @bsideup for the quick fix!

DaspawnW commented 3 years ago

Are you planning to backport this to work also with junit 4?

bsideup commented 3 years ago

@DaspawnW this is not junit specific and, once released, will work with any type of integration (junit4, junit jupiter, spock, manual container lifecycle)

arhohuttunen commented 3 years ago

@keeganwitt did you ever find a reasonable workaround for builds running in Gitlab? We have been looking at this for a day now without much success. It works if you manually pre-pull the images, but we are using docker-machine to autoscale the runners in EC2, so manual work is not really an option.

jdelucaa commented 3 years ago

@bsideup I am seeing this also in 1.14.0.

bsideup commented 3 years ago

@jdelucaa yes, this Docker API change applies to most of Testcontainers versions.

keeganwitt commented 3 years ago

@keeganwitt did you ever find a reasonable workaround for builds running in Gitlab? We have been looking at this for a day now without much success. It works if you manually pre-pull the images, but we are using docker-machine to autoscale the runners in EC2, so manual work is not really an option.

Not really. I have exactly the same setup. For now, we just commented the tests out, since a fix is forthcoming. A few ideas came to mind, but I haven't really thought through them yet.

  1. Customize the AMI to bake in the images. Not totally sure if that'd work, and seems like a bad idea.
  2. Put the docker pull commands as a user data script.
  3. Don't use the Docker Executor, and instead use the Shell Executor and first pull the images. Assuming that works, you'd have to specify the executor on the runner itself, so you'd need a separate set of runners to do this (which is possible, just give those runners a different tag).
  4. Use DinD (Docker in Docker), and execute your build in a Dockerfile, rather than a script that runs on an image. Might take some fiddling to get the volumes mounted correctly, I've never tried to do that before.

None of these seemed great. If one of them sounds promising, I can explain in a little more detail what I was thinking, though there may be gotchas I haven't thought of. Offhand, the user data script seems like the most promising to me.

arhohuttunen commented 3 years ago

We thought about option 1, but quickly discarded that idea. We also tried option 2 but apparently Docker isn’t installed at that point yet, so didn’t really proceed further with that. Neither 3 or 4 felt like a good idea, so I guess we’ll just skip the tests using testcontainers for now.

Thanks for sharing, though.

keeganwitt commented 3 years ago

@arhohuttunen I'm now thinking this broke because we upgraded GL Runners this week, which upgraded Docker version. So downgrading should fix that. Unless others didn't upgrade and still ran into this? I could have sworn we had tests pass after the upgrade, but I'm not sure what else could have changed.

I upgraded Docker for Mac to 3.0 (which has Docker 20.10 in it) this morning, and the tests now fail locally too.

arhohuttunen commented 3 years ago

@keeganwitt I think on the runners we are using the docker stable tag, which should still point to 19.03.14 according to this: https://hub.docker.com/layers/docker/library/docker/stable/images/sha256-8f71deccd0856d8a36db659a8c82894be97546b47c1817de27d5ee7eea860162?context=explore

keeganwitt commented 3 years ago

@arhohuttunen Sorry, I didn't mean the runners image, I meant the machines on which the runner image runs (where the daemon lives). We use https://github.com/npalm/terraform-aws-gitlab-runner, which would upgrade that. This applies to private runners, not the shared ones that GL manages. I dunno what schedule those are upgraded on, we don't use them.

I've been talking with another of our engineers and he said userdata isn't the same as ec2-userdata. I didn't realize there were 2.

Urokhtor commented 3 years ago

We use a hard-coded AMI in our runners and haven't changed that lately, so that should not be the root cause in our case.

keeganwitt commented 3 years ago

Actually, some builds passed after our upgrade too, so it shouldn't have been that. I'm confused why yesterday was the breaking day.

Urokhtor commented 3 years ago

I'll take previous comment back because those runners install Docker from official repo which does serve 20.10.

jeantil commented 3 years ago

I'm encountering this issue on apaceh CI builds for the apache/james project (https://builds.apache.org/blue/organizations/jenkins/james%2FApacheJames/detail/PR-268/16/pipeline) I tried pulling the image explicitly before running the tests but it still fails. we are very much looking forward to the 1.15.1 hotfix

edit I was misled by a comment above that referred to testcontainersofficial instead of testcontainers I assume this was some kind of custom setup. The tests try to get testcontainers/ryuk:0.3.0 by default no testcontainersofficial/ryuk:0.3.0

keeganwitt commented 3 years ago

@jeantil That's correct, testcontainers is the default, testcontainersofficial was just a misguided thing I attempted early on (overriding the image in testcontainers properties). I saw testcontainersofficial mentioned in an issue where they were discussing Docker Hub and Quay. Sorry for the confusion.

mderouet commented 3 years ago

When can we expect to have this 1.15.1 release ? (In order to know whether we need to find a workaround for this or just wait for it)

bsideup commented 3 years ago

@mderouet the release is expected for later today (tsss ;))

bsideup commented 3 years ago

released in 1.15.1 🎉

telapo commented 3 years ago

It works, thanks

alex-sky-cloud commented 3 years ago

I got an update from Docker today, and I got Docker desktop-3.0.1 (50773).

Now I get an error while running Test container.

...... <<< ERROR! org.testcontainers.containers.ContainerLaunchException: Container startup failed Caused by: org.testcontainers.containers.ContainerFetchException: Can't get Docker image: RemoteDockerImage(imageName=postgres:11.7, imagePullPolicy=DefaultPullPolicy()) Caused by: com.github.dockerjava.api.exception.NotFoundException: {"message":"No such image: testcontainersofficial/ryuk:0.3.0"}

I assumed you fixed it. I also disabled the Use gRPC FUSE for file sharing option, but it didn't help.

How can I fix this ?

bsideup commented 3 years ago

@alex-sky-cloud this is another issue, unrelated to the file sharing, and it is fixed in 1.15.1, please update.

alex-sky-cloud commented 3 years ago

I'm sorry. What should I update ?

bsideup commented 3 years ago

@alex-sky-cloud the project you're reporting to - Testcontainers :D

alex-sky-cloud commented 3 years ago

I have the latest version of docker.

Which project should I update ?

I executed the

docker system prune-af

command and only after that the error disappeared.

So, is this how you will need to do it every time ?

bsideup commented 3 years ago

@alex-sky-cloud No. Just use the latest (1.15.1) version of Testcontainers.

davoutuk commented 3 years ago

This issue is still live for me when I tried using this on a Spring Boot/Postgres app. The fix of separating downloading the 'testcontainer' docker image works.
I would suggest adding a note to the docs to explain that this is a pre requiste

bsideup commented 3 years ago

@davoutuk with Testcontainers 1.15.1? There was a bug that got fixed in 1.15.1, there isn't such prerequisite