testcontainers / testcontainers-python

Testcontainers is a Python library that providing a friendly API to run Docker container. It is designed to create runtime environment to use during your automatic tests.
https://testcontainers-python.readthedocs.io/en/latest/
Apache License 2.0
1.51k stars 281 forks source link

Feature: Seed data for DbContainer #541

Open OverkillGuy opened 4 months ago

OverkillGuy commented 4 months ago

What are you trying to do?

In order to do meaningful tests using my fresh database, I wish to load arbitrary SQL scripts from my DbContainer-s.

I can do (and have done) this per-project, in test folders, but believe there's value in allowing an arbitrary list of SQL scripts to be run before the testcontainer is yielded, and making this a pytest fixture to reuse.

I do this usually by first passing a path to a folder with scripts, volume-mount it to /seeds path in the database container, and also pass a list of script files in that folder to be executed. Then before yielding the ready db-container, I loop over each script filename and do container.exec_run(["mysql", "-e" f"source /seeds/{script}").

The list of scripts means we can feed both schema first, then arbitrary sample data files.

I suggest a new DbContainer._seed() method, defaulting to raise NotImplementedError, to be overriden per database, allowing a new optional parameter seed.

Each database then just has to define their _seed() function to the specific way to exec_run the scripts (psql for Postgres, mysql for MySQL CLI...)

Example from #542 implementation draft:

>>> import sqlalchemy
>>> from testcontainers.mysql import MySqlContainer
>>> seed_data = ("db/", ["schema.sql", "data.sql"])
>>> with MySqlContainer(seed=seed_data) as mysql:
...     engine = sqlalchemy.create_engine(mysql.get_connection_url())
...     with engine.begin() as connection:
...         query = "select * from stuff"  # Can now rely on schema/data
...         result = connection.execute(sqlalchemy.text(query))
...         first_stuff, = result.fetchone()

Why should it be done this way?

I believe most users of testcontainers, beyond just testing connectivity with a database, want a simple-to-reproduce test environment with realistic datastructure and mock data. This is currently annoying to do locally, and can be upstreamed into a simple-looking seed parameter, which could enable the feature for most databases, by pushing it into DbContainer itself.

See #542 for a draft implementation using MySQL as target (generalizeable to others of course)

alexanderankin commented 4 months ago

how does this compare to this functionality in the mysql and postgres images?

OverkillGuy commented 4 months ago

how does this compare to this functionality in the mysql and postgres images?

Good shout, I didn't know these existed (there goes my evening, reading the manual like I should have done first thing)

Let's say, at best, this proposed seed option can get reduced to just the local path, and we mount it to those standard /docker-entrypoint-initdb.d/ folders, simplifying most of the implementation This can harmonize the feature for images that support it?

Or we could just scrap the feature request altogether because it exists already as part of most images, haha. Either way works for me, the feature gap itself is solved by the links provided.

alexanderankin commented 4 months ago

i think theoretically it could be a neat thing to highlight the feature in the library images, but since it seems like we are now discussing quite a different implementation than proposed in the PR, im going to close the PR

OverkillGuy commented 4 months ago

Makes sense!

Just to be clear, the only convenient way I see of exposing the /docker-entrypoint-initdb.d/ folder is to mount it as .with_volume_mapping() before the container is created in docker, which requires a change in the DbContainer derivatives themselves?

The less convenient way that I don't recommend is to have some kind of Dockerfile with:

FROM somedatabaseimage
COPY db/seeds/ /docker-entrypoint-initdb.d/

and then using that image? That doesn't feel very convenient, building an intermediary image we don't care about (long term) just to get this one test to work?

With that in mind, I will explore a similar seed parameter as before, but with the intent to mount a volume to that well known /docker-entrypoint-initdb.d/ folder, during the DbContainer creation, but of course I welcome different solutions.

I'm aiming for:

with MysqlContainer(seed="db/seeds/"):
# equivalent to:
container.with_volume_mapping("db/seeds/", "/docker-entrypoint-initdb.d/")

I'll need to confirm if this folder works for ALL DbContainer variants (in which case it's nice to move it to that level) or if it's an image-specific thing, in which case I leave it to the mysql or postgres level.

OverkillGuy commented 4 months ago

I'm aiming for:

with MysqlContainer(seed="db/seeds/"):
# equivalent to:
container.with_volume_mapping("db/seeds/", "/docker-entrypoint-initdb.d/")

Implemented in #552 for MySQL, to further this discussion with specific example. If that's deemed useful, I can do the same for postgres and then look for other images supporting this folder structure.

Edit to add: Note I'm not attached to any of these PRs, so feel free to chuck them out if that's not useful. I'm just happy to get familiar with the innards of this codebase for my own future uses as I explore this feature request, hoping the PRs make the discussion easier.

alexanderankin commented 4 months ago

i think theoretically there is a preference for is for using transferable over volume mapping, because volume mapping only works locally (local to the docker engine), but we dont have that in python implementation yet - it looks something like this - theoretically my implementation could be even more tied to tar equivalent and also there's an even more concise version of "copy file to container" algorithm floating around in the codebase already, like in the kafka module. havent had time to really think about this too much but i approved your run and it looks like all the checks passed.

OverkillGuy commented 4 months ago

So, I was curious about put_archive and the "transferable" system, so I implemented it experimentally, and it didn't work: The files did make it to the folder, but the tables weren't loaded in any of the tests, which I can see by running docker exec on breakpoints.

Can't prove it, but I think that since the archive can only be loaded once the container has started (compared to volume mounts which become part of the creation of container), the mysql server is no longer looking at the /docker-entrypoint-initdb.d/ folder by the time data gets in, so the test table stays empty and fails test.

Attached is the patch (applies on top of the linked PR), in case it's useful anyway?

Patch: transferring using tar archives. Click to expand ```patch From ac0fc34717c27b53d3f291ca59b3218d2036c762 Mon Sep 17 00:00:00 2001 From: Jb DOYON Date: Fri, 26 Apr 2024 02:06:06 +0100 Subject: [PATCH] Experiment with transfering via tar archives --- core/testcontainers/core/generic.py | 4 ++++ modules/mysql/testcontainers/mysql/__init__.py | 14 +++++++++++++- 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/core/testcontainers/core/generic.py b/core/testcontainers/core/generic.py index 515c283..23c24ea 100644 --- a/core/testcontainers/core/generic.py +++ b/core/testcontainers/core/generic.py @@ -71,7 +71,11 @@ class DbContainer(DockerContainer): self._configure() super().start() self._connect() + self._transfer_seed() return self def _configure(self) -> None: raise NotImplementedError + + def _transfer_seed(self) -> None: + pass diff --git a/modules/mysql/testcontainers/mysql/__init__.py b/modules/mysql/testcontainers/mysql/__init__.py index e95b87d..c7ed97b 100644 --- a/modules/mysql/testcontainers/mysql/__init__.py +++ b/modules/mysql/testcontainers/mysql/__init__.py @@ -11,7 +11,10 @@ # License for the specific language governing permissions and limitations # under the License. import re +import tarfile +from io import BytesIO from os import environ +from pathlib import Path from typing import Optional from testcontainers.core.generic import DbContainer @@ -85,7 +88,7 @@ class MySqlContainer(DbContainer): if self.username == "root": self.root_password = self.password if seed is not None: - self.with_volume_mapping(seed, "/docker-entrypoint-initdb.d/") + self.seed = seed def _configure(self) -> None: self.with_env("MYSQL_ROOT_PASSWORD", self.root_password) @@ -105,3 +108,12 @@ class MySqlContainer(DbContainer): return super()._create_connection_url( dialect="mysql+pymysql", username=self.username, password=self.password, dbname=self.dbname, port=self.port ) + + def _transfer_seed(self) -> None: + src_path = Path(self.seed) + dest_path = "/docker-entrypoint-initdb.d/" + with BytesIO() as archive, tarfile.TarFile(fileobj=archive, mode="w") as tar: + for filename in src_path.iterdir(): + tar.add(filename.absolute(), arcname=filename.relative_to(src_path)) + archive.seek(0) + self.get_wrapped_container().put_archive(dest_path, archive) -- 2.34.1 ```

Curious if you have an idea on how to work around that.

EDIT: Wrong again, it's because I waited to do the archive transfer till AFTER the connect, not before, which waits for container ready, and moving it before makes it work again. Updated #552 to use this new system!

alexanderankin commented 4 months ago

Yes, transferring only works after it starts, so you want to run that after calling super().start() in the overriden .start()

On Thu, Apr 25, 2024, 9:20 PM Jb DOYON @.***> wrote:

So, I was curious about put_archive and the "transferable" system, so I implemented it experimentally, and it didn't work: The files did make it to the folder, but the tables weren't loaded in any of the tests, which I can see by running docker exec on breakpoints.

Can't prove it, but I think that since the archive can only be loaded once the container has started (compared to volume mounts which become part of the creation of container, the mysql server is no longer looking at the /docker-entrypoint-initdb.d/ folder by the time data gets in, so the test table stays empty and fails test.

Attached is the patch (applies on top of the linked PR), in case it's useful anyway? Patch: transferring using tar archives. Click to expand ```patch From ac0fc34717c27b53d3f291ca59b3218d2036c762 Mon Sep 17 00:00:00 2001 From: Jb DOYON Date: Fri, 26 Apr 2024 02:06:06 +0100 Subject: [PATCH] Experiment with transfering via tar archives

core/testcontainers/core/generic.py | 4 ++++ modules/mysql/testcontainers/mysql/init.py | 14 +++++++++++++- 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/core/testcontainers/core/generic.py b/core/testcontainers/core/generic.py index 515c283..23c24ea 100644 --- a/core/testcontainers/core/generic.py +++ b/core/testcontainers/core/generic.py @@ -71,7 +71,11 @@ class DbContainer(DockerContainer): self._configure() super().start() self._connect()

-

  self._transfer_seed()
  return self

def _configure(self) -> None: raise NotImplementedError

-

def _transfer_seed(self) -> None:

  pass

diff --git a/modules/mysql/testcontainers/mysql/init.py b/modules/mysql/testcontainers/mysql/init.py index e95b87d..c7ed97b 100644 --- a/modules/mysql/testcontainers/mysql/init.py +++ b/modules/mysql/testcontainers/mysql/init.py @@ -11,7 +11,10 @@ License for the specific language governing permissions and limitations under the License.

import re +import tarfile +from io import BytesIO from os import environ +from pathlib import Path from typing import Optional

from testcontainers.core.generic import DbContainer @@ -85,7 +88,7 @@ class MySqlContainer(DbContainer): if self.username == "root": self.root_password = self.password if seed is not None:

-

      self.with_volume_mapping(seed, "/docker-entrypoint-initdb.d/")

-

      self.seed = seed

def _configure(self) -> None: self.with_env("MYSQL_ROOT_PASSWORD", self.root_password) @@ -105,3 +108,12 @@ class MySqlContainer(DbContainer): return super()._create_connection_url( dialect="mysql+pymysql", username=self.username, password=self.password, dbname=self.dbname, port=self.port )

-

def _transfer_seed(self) -> None:

  src_path = Path(self.seed)

-

  dest_path = "/docker-entrypoint-initdb.d/"

-

  with BytesIO() as archive, tarfile.TarFile(fileobj=archive, mode="w") as tar:

-

      for filename in src_path.iterdir():

-

          tar.add(filename.absolute(), arcname=filename.relative_to(src_path))

-

      archive.seek(0)

-

      self.get_wrapped_container().put_archive(dest_path, archive)

-- 2.34.1

Curious if you have an idea on how to work around that.

— Reply to this email directly, view it on GitHub https://github.com/testcontainers/testcontainers-python/issues/541#issuecomment-2078463631, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACECGJGIIFE5GNZ5OKQV3L3Y7GTWZAVCNFSM6AAAAABGKF4JQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZYGQ3DGNRTGE . You are receiving this because you commented.Message ID: @.***>

OverkillGuy commented 4 months ago

Small bump: Implementation of the proof of concept of MySQL in #552 is using transferable now (as stated preference) and the feature was agreed to be of (vague hehe) interest.

Should we get into a review phase for this proof of concept PR?

OverkillGuy commented 3 months ago

Wow, thank you for merging that PR! (I expected more of a fight, haha) I feel like our Postgres friends would be missing out on this, so I'll try to draft a similar PR for pg, will link here.

Can anyone think of other DbContainer derivatives that may benefit from this feature too?

alexanderankin commented 3 months ago

Maybe documentation about it would be good

On Sat, May 11, 2024, 11:47 AM Jb DOYON @.***> wrote:

Wow, thank you for merging that PR! (I expected more of a fight, haha) I feel like our Postgres friends would be missing out on this, so I'll try to draft a similar PR for pg, will link here.

Can anyone think of other DbContainer derivatives that may benefit from this feature too?

— Reply to this email directly, view it on GitHub https://github.com/testcontainers/testcontainers-python/issues/541#issuecomment-2105937443, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACECGJHKCDXQ2XS6IX3PPRDZBY4PJAVCNFSM6AAAAABGKF4JQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBVHEZTONBUGM . You are receiving this because you commented.Message ID: @.***>

OverkillGuy commented 3 months ago

See #576 which fixes the doctest of mysql, which acted as documentation for the feature, and adds the same feature to postgres.

I'm interested in better documentation for the feature, but I'm not sure how that could look like? This is a per-image feature worth a paragraph or two, as I've got it set up right now, no?