testcontainers / testcontainers-python

Testcontainers is a Python library that providing a friendly API to run Docker container. It is designed to create runtime environment to use during your automatic tests.
https://testcontainers-python.readthedocs.io/en/latest/
Apache License 2.0
1.38k stars 253 forks source link

Dependencies: Stop bundling client libraries #526

Open santi opened 1 month ago

santi commented 1 month ago

When installing some of our community modules through extras, we usually bundle a client library for interacting with the service provided from the container. In some cases this is nice, as there is only a single, official client library available for that service (E.g. qdrant-client).

However, in many cases there is no single client library that makes for the best choice. Some examples are sqlalchemy in combination with other drivers for connecting to databses, where you have the choice between multiple drivers (such as pyodbc and pymssql for SQL Server, pg8000 and psycopg for Postgres and so on). Many use cases don't even use sqlalchemy for their database connection, and we are thus forcing an install of one or more useless packages in their builds.

Therefore I suggest that we remove all dependencies for client libraries in our extras and put the responsibility for connecting to the containers with their chosen framework on the users of tc-python. We will still keep client libraries as part of our dev dependencies, so that we can provide usage examples in our doctests and other documentation, but simply installing testcontainers-python with some extras will no longer give you a large bundle of dependencies you might not need.

Where possible, we will still include and maintain methods such as get_connection_string (and variants, where we should probably standardize on a naming convention for consistency) for easy usage with client libraries. Example usage with client libraries also become more important, but in those examples we always show imports of the client library used, so it should be quite self-explanatory that you need a client to connect to a service.

This work has already been partly started, and contain useful discussion around the issue. See here for the work done on removing sqlalchemy from Postgres.

bearrito commented 1 month ago

I had an approach for the recently added nats module, ultimately we just included the standard client. I'll throw it out as an idea. Namely if you want to run the test using the semi-canonical client it works.

NO_NATS_CLIENT = True
try:
    from nats import connect as nats_connect
    from nats.aio.client import Client as NATSClient

    NO_NATS_CLIENT = False
except ImportError:
    pass

 @pytest.mark.skipif(NO_NATS_CLIENT, reason="No NATS Client Available")
async def test_pubsub():  
  ... elided ...

Couple other thoughts

  1. One thing I see the Postgres stuff doing is execing into the containers and running commands. It isn't obvious all the services have this sort of ability, and if they do if the required libraries are included in their respective containers.
  2. How do you test that the connstring is correct?
  3. Maybe folks have better tooling and dev workflows than me, but I think a contributor would be less likely to have good code coverage if they are implemented as doc tests.
santi commented 1 month ago

I had an approach for the recently added nats module, ultimately we just included the standard client. I'll throw it out as an idea. Namely if you want to run the test using the semi-canonical client it works.

NO_NATS_CLIENT = True
try:
    from nats import connect as nats_connect
    from nats.aio.client import Client as NATSClient

    NO_NATS_CLIENT = False
except ImportError:
    pass

 @pytest.mark.skipif(NO_NATS_CLIENT, reason="No NATS Client Available")
async def test_pubsub():  
  ... elided ...

I still think we should include the client libraries as dev dependencies, to make it easier to test in general and to test that common usage patterns are working as intended. However, all dev dependencies should be optional, so that they are not installed automatically when installing the testcontainers package itself.

Couple other thoughts

  1. One thing I see the Postgres stuff doing is execing into the containers and running commands. It isn't obvious all the services have this sort of ability, and if they do if the required libraries are included in their respective containers.

Yupp, and if the images come bundled with tools intended for readiness checking I think we can utilize that more to actually perform readiness checking. Parsing logs are okay, but not always ideal, especially if there is no message where it is abundantly clear that the message is meant to indicate that the container is ready.

  1. How do you test that the connstring is correct?

I think we misunderstood each other a little here. I still would like to include client libraries as dev dependencies, so that they can be used for tests.

  1. Maybe folks have better tooling and dev workflows than me, but I think a contributor would be less likely to have good code coverage if they are implemented as doc tests.

Agreed. doctests in my opinion is only for showing usage of the classes/modules, and the doctests themselves are only there to ensure that the example actually works. With little syntax highlighting and help from your editor, the doctests often end up invalid, and the doctests help prevent that