Open OverkillGuy opened 7 months ago
how does this compare to this functionality in the mysql and postgres images?
Good shout, I didn't know these existed (there goes my evening, reading the manual like I should have done first thing)
Let's say, at best, this proposed seed
option can get reduced to just the local path, and we mount it to those standard /docker-entrypoint-initdb.d/
folders, simplifying most of the implementation
This can harmonize the feature for images that support it?
Or we could just scrap the feature request altogether because it exists already as part of most images, haha. Either way works for me, the feature gap itself is solved by the links provided.
i think theoretically it could be a neat thing to highlight the feature in the library images, but since it seems like we are now discussing quite a different implementation than proposed in the PR, im going to close the PR
Makes sense!
Just to be clear, the only convenient way I see of exposing the /docker-entrypoint-initdb.d/
folder is to mount it as .with_volume_mapping()
before the container is created in docker, which requires a change in the DbContainer
derivatives themselves?
The less convenient way that I don't recommend is to have some kind of Dockerfile
with:
FROM somedatabaseimage
COPY db/seeds/ /docker-entrypoint-initdb.d/
and then using that image? That doesn't feel very convenient, building an intermediary image we don't care about (long term) just to get this one test to work?
With that in mind, I will explore a similar seed
parameter as before, but with the intent to mount a volume to that well known /docker-entrypoint-initdb.d/
folder, during the DbContainer
creation, but of course I welcome different solutions.
I'm aiming for:
with MysqlContainer(seed="db/seeds/"):
# equivalent to:
container.with_volume_mapping("db/seeds/", "/docker-entrypoint-initdb.d/")
I'll need to confirm if this folder works for ALL DbContainer
variants (in which case it's nice to move it to that level) or if it's an image-specific thing, in which case I leave it to the mysql or postgres level.
I'm aiming for:
with MysqlContainer(seed="db/seeds/"): # equivalent to: container.with_volume_mapping("db/seeds/", "/docker-entrypoint-initdb.d/")
Implemented in #552 for MySQL, to further this discussion with specific example. If that's deemed useful, I can do the same for postgres and then look for other images supporting this folder structure.
Edit to add: Note I'm not attached to any of these PRs, so feel free to chuck them out if that's not useful. I'm just happy to get familiar with the innards of this codebase for my own future uses as I explore this feature request, hoping the PRs make the discussion easier.
i think theoretically there is a preference for is for using transferable over volume mapping, because volume mapping only works locally (local to the docker engine), but we dont have that in python implementation yet - it looks something like this - theoretically my implementation could be even more tied to tar equivalent and also there's an even more concise version of "copy file to container" algorithm floating around in the codebase already, like in the kafka module. havent had time to really think about this too much but i approved your run and it looks like all the checks passed.
So, I was curious about put_archive
and the "transferable" system, so I implemented it experimentally, and it didn't work:
The files did make it to the folder, but the tables weren't loaded in any of the tests, which I can see by running docker exec on breakpoints.
Can't prove it, but I think that since the archive can only be loaded once the container has started (compared to volume mounts which become part of the creation of container), the mysql server is no longer looking at the /docker-entrypoint-initdb.d/ folder by the time data gets in, so the test table stays empty and fails test.
Attached is the patch (applies on top of the linked PR), in case it's useful anyway?
Curious if you have an idea on how to work around that.
EDIT: Wrong again, it's because I waited to do the archive transfer till AFTER the connect, not before, which waits for container ready, and moving it before makes it work again. Updated #552 to use this new system!
Yes, transferring only works after it starts, so you want to run that after calling super().start() in the overriden .start()
On Thu, Apr 25, 2024, 9:20 PM Jb DOYON @.***> wrote:
So, I was curious about put_archive and the "transferable" system, so I implemented it experimentally, and it didn't work: The files did make it to the folder, but the tables weren't loaded in any of the tests, which I can see by running docker exec on breakpoints.
Can't prove it, but I think that since the archive can only be loaded once the container has started (compared to volume mounts which become part of the creation of container, the mysql server is no longer looking at the /docker-entrypoint-initdb.d/ folder by the time data gets in, so the test table stays empty and fails test.
Attached is the patch (applies on top of the linked PR), in case it's useful anyway? Patch: transferring using tar archives. Click to expand ```patch From ac0fc34717c27b53d3f291ca59b3218d2036c762 Mon Sep 17 00:00:00 2001 From: Jb DOYON Date: Fri, 26 Apr 2024 02:06:06 +0100 Subject: [PATCH] Experiment with transfering via tar archives
core/testcontainers/core/generic.py | 4 ++++ modules/mysql/testcontainers/mysql/init.py | 14 +++++++++++++- 2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/core/testcontainers/core/generic.py b/core/testcontainers/core/generic.py index 515c283..23c24ea 100644 --- a/core/testcontainers/core/generic.py +++ b/core/testcontainers/core/generic.py @@ -71,7 +71,11 @@ class DbContainer(DockerContainer): self._configure() super().start() self._connect()
-
self._transfer_seed() return self
def _configure(self) -> None: raise NotImplementedError
-
def _transfer_seed(self) -> None:
pass
diff --git a/modules/mysql/testcontainers/mysql/init.py b/modules/mysql/testcontainers/mysql/init.py index e95b87d..c7ed97b 100644 --- a/modules/mysql/testcontainers/mysql/init.py +++ b/modules/mysql/testcontainers/mysql/init.py @@ -11,7 +11,10 @@ License for the specific language governing permissions and limitations under the License.
import re +import tarfile +from io import BytesIO from os import environ +from pathlib import Path from typing import Optional
from testcontainers.core.generic import DbContainer @@ -85,7 +88,7 @@ class MySqlContainer(DbContainer): if self.username == "root": self.root_password = self.password if seed is not None:
-
self.with_volume_mapping(seed, "/docker-entrypoint-initdb.d/")
-
self.seed = seed
def _configure(self) -> None: self.with_env("MYSQL_ROOT_PASSWORD", self.root_password) @@ -105,3 +108,12 @@ class MySqlContainer(DbContainer): return super()._create_connection_url( dialect="mysql+pymysql", username=self.username, password=self.password, dbname=self.dbname, port=self.port )
-
def _transfer_seed(self) -> None:
src_path = Path(self.seed)
-
dest_path = "/docker-entrypoint-initdb.d/"
-
with BytesIO() as archive, tarfile.TarFile(fileobj=archive, mode="w") as tar:
-
for filename in src_path.iterdir():
-
tar.add(filename.absolute(), arcname=filename.relative_to(src_path))
-
archive.seek(0)
-
self.get_wrapped_container().put_archive(dest_path, archive)
-- 2.34.1
Curious if you have an idea on how to work around that.
— Reply to this email directly, view it on GitHub https://github.com/testcontainers/testcontainers-python/issues/541#issuecomment-2078463631, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACECGJGIIFE5GNZ5OKQV3L3Y7GTWZAVCNFSM6AAAAABGKF4JQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZYGQ3DGNRTGE . You are receiving this because you commented.Message ID: @.***>
Small bump: Implementation of the proof of concept of MySQL in #552 is using transferable now (as stated preference) and the feature was agreed to be of (vague hehe) interest.
Should we get into a review phase for this proof of concept PR?
Wow, thank you for merging that PR! (I expected more of a fight, haha) I feel like our Postgres friends would be missing out on this, so I'll try to draft a similar PR for pg, will link here.
Can anyone think of other DbContainer derivatives that may benefit from this feature too?
Maybe documentation about it would be good
On Sat, May 11, 2024, 11:47 AM Jb DOYON @.***> wrote:
Wow, thank you for merging that PR! (I expected more of a fight, haha) I feel like our Postgres friends would be missing out on this, so I'll try to draft a similar PR for pg, will link here.
Can anyone think of other DbContainer derivatives that may benefit from this feature too?
— Reply to this email directly, view it on GitHub https://github.com/testcontainers/testcontainers-python/issues/541#issuecomment-2105937443, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACECGJHKCDXQ2XS6IX3PPRDZBY4PJAVCNFSM6AAAAABGKF4JQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBVHEZTONBUGM . You are receiving this because you commented.Message ID: @.***>
See #576 which fixes the doctest of mysql, which acted as documentation for the feature, and adds the same feature to postgres.
I'm interested in better documentation for the feature, but I'm not sure how that could look like? This is a per-image feature worth a paragraph or two, as I've got it set up right now, no?
What are you trying to do?
In order to do meaningful tests using my fresh database, I wish to load arbitrary SQL scripts from my DbContainer-s.
I can do (and have done) this per-project, in test folders, but believe there's value in allowing an arbitrary list of SQL scripts to be run before the testcontainer is yielded, and making this a pytest fixture to reuse.
I do this usually by first passing a path to a folder with scripts, volume-mount it to
/seeds
path in the database container, and also pass a list of script files in that folder to be executed. Then before yielding the ready db-container, I loop over eachscript
filename and docontainer.exec_run(["mysql", "-e" f"source /seeds/{script}")
.The
list
of scripts means we can feed both schema first, then arbitrary sample data files.I suggest a new
DbContainer._seed()
method, defaulting toraise NotImplementedError
, to be overriden per database, allowing a new optional parameterseed
.Each database then just has to define their
_seed()
function to the specific way toexec_run
the scripts (psql
for Postgres,mysql
for MySQL CLI...)Example from #542 implementation draft:
Why should it be done this way?
I believe most users of testcontainers, beyond just testing connectivity with a database, want a simple-to-reproduce test environment with realistic datastructure and mock data. This is currently annoying to do locally, and can be upstreamed into a simple-looking
seed
parameter, which could enable the feature for most databases, by pushing it intoDbContainer
itself.See #542 for a draft implementation using MySQL as target (generalizeable to others of course)