testcontainers / testcontainers-java

Testcontainers is a Java library that supports JUnit tests, providing lightweight, throwaway instances of common databases, Selenium web browsers, or anything else that can run in a Docker container.
https://testcontainers.org
MIT License
8.01k stars 1.65k forks source link

Opportunities for refactoring in GenericContainer #107

Open ostacey opened 8 years ago

ostacey commented 8 years ago

The startup code in GenericContainer seems pretty aggressive in retrying startup failures. I've looked at the code and I'm not sure about the value of some of the things it's doing. I think I may be missing some of the use cases you're trying to support here.

First, there's the inner loop where it waits for the status "running". I've tried this, but I've yet to see a case where a container started up correctly and didn't immediately return "running"... but again, maybe I just haven't seen the use case. It also seems that, since there's no current way to set the container as retrying, if the state is "exited" and the finished at time is not null, it's safe to assume the container will never start up, so we could bail out before the full 30 seconds.

Second, the outer loop, where we retry 3 times. I can't think of a case where the container will fail to start the first time and succeed the second, unless you're starting multiple containers in parallel, or unless the container is flaky - and in the context of an automated test, if something is flaky I'd think that reporting the failure would be more desirable. The retry loop adds complexity, and at least one bug (#102), so I wonder if it would be better to just remove it. Unless, again, there's a use case it supports that I'm not aware of...

rnorth commented 8 years ago

Hi, Thanks for raising this plus the other issues and PR - it's all helpful and appreciated :)

Regarding the inner loop wait for running state, this code is fairly old so I need to go back through the commits and try and dig up the rationale for the extra-paranoid guard. I really like your idea of using the finishedAt field to fail-fast - that's something that I'd like to incorporate.

The outer loop retry is, sadly, an interesting case. This relates to #51, which manifested as containers very rarely failing to open their listening ports before the timeout. This was quite an issue for a while, but I couldn't discern any particular pattern or reproduceability. Normally I'm as uncomfortable as anybody about flaky elements, but basically my reasoning was that (a) the container is largely immutable, so any flakiness should (mostly) be caused by its environment/infrastructure, (b) there were no other observable ill effects - no other noticeable random test outcomes or tests miraculously passing which shouldn't (which would require a total lack of assertions at the very least). On this basis, as most likely a docker infra or library issue, I judged adding the retry to prevent false negative test failures as the lesser of two evils...

All this said, we might need to carefully reassess this in the context of the work being done by @bsideup for Dockerfile-based container support. If there's an impending use case where the container is actually the component under test, we might need to make a different value judgement about this retry loop.

Thanks for bringing this up - it may cause me some sleepless nights, but it's probably a good thing to be thinking about!

Rich

ostacey commented 8 years ago

I've been thinking about how some things might get simpler if GenericContainer was changed to use a Builder pattern. That is, you'd have two classes:

start() would initialize a container and, if it started up OK, assign it to the "container" member variable. Container would be responsible for its own cleanup, so if "start()" had to retry, that wouldn't cause a problem.

That might also help clean up the API of GenericContainer; as it stands, it's a mix of builder-style "with" methods that only make sense as initialization time, and methods like "getMappedPort()" that only make sense on a running container. And I think applying the same approach to DockerComposeContainer might make it a bit cleaner / easier to understand, too...

That all said, that would be a pretty hefty API change. I'm not sure what your policy is on that.

rnorth commented 8 years ago

Hi @ostacey, Yes, I can definitely see the benefits of the builder pattern - it's something I've been tempted to try once or twice. As you say though it would be a fairly major change to the API, so it would need a bit of care and consideration (probably a bit more than I can give it right now, unfortunately).

However, I'll take a medium-term action to come up with some examples and run them by other users, too.

Fundamentally, though, I'd want to ensure that the general feel of the API doesn't change too much, and to ensure that migration is as painless as possible. This would probably need to be a major version number bump, too.

I think I'll change this ticket title to reflect that there are some general refactoring opportunities here that I'd like to apply or resolve more conclusively in the longer term...

Cheers

Richard

bsideup commented 7 years ago

@ostacey just letting you know that we're going to refactor it in 2.0 and make it more builder-friendly, as well as extensible. We might use some of your ideas, thank you!

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this is a mistake, please reply to this comment to keep it open. If there isn't one already, a PR to fix or at least reproduce the problem in a test case will always help us get back on track to tackle this.

rnorth commented 5 years ago

Will remove the stale label because we still want to think about this for 2.0

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this is a mistake, please reply to this comment to keep it open. If there isn't one already, a PR to fix or at least reproduce the problem in a test case will always help us get back on track to tackle this.

stale[bot] commented 5 years ago

This issue has been automatically closed due to inactivity. We apologise if this is still an active problem for you, and would ask you to re-open the issue if this is the case.