fix(postgres): Add seed feature to postgres

testcontainers / testcontainers-python

Testcontainers is a Python library that providing a friendly API to run Docker container. It is designed to create runtime environment to use during your automatic tests.

https://testcontainers-python.readthedocs.io/en/latest/

Apache License 2.0

1.51k stars 281 forks source link

fix(postgres): Add seed feature to postgres #576

Open OverkillGuy opened 3 months ago

OverkillGuy commented 3 months ago

Ref #541, implement the feature for postgres

Fix mysql "seed" feature's docstring (whoops)

OverkillGuy commented 3 months ago

This duplicated transfer_seed function from mysql makes me think it could be done via a more reusable "Mixin" class, or something, but that's up for discussion if the complexity overhead is worth the removal of duplication.

alexanderankin commented 3 months ago

I believe the overhead is not worth it, duplication is fine as we probably don't want to be in the business of maintaining that code in core. Core should stay clean/lean

On Mon, May 20, 2024, 7:17 PM Jb DOYON @.***> wrote:

This duplicated transfer_seed function from mysql makes me think it could be done via a more reusable "Mixin" class, or something, but that's up for discussion if the complexity overhead is worth the removal of duplication.

— Reply to this email directly, view it on GitHub https://github.com/testcontainers/testcontainers-python/pull/576#issuecomment-2121471474, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACECGJB454KBY5DEQ5TYI43ZDKHCTAVCNFSM6AAAAABIASPHN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRRGQ3TCNBXGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

alexanderankin commented 3 months ago

in fact if you can remove transfer_seed from the generic class that would be great, we are planning to remove that eventually. i dont know how the mysql tests ever passed, without even looking at any test executions - you have to do the following dance if you want to place files into the container before the entrypoint executes (and this would go into core if we ever make it generic enough, like the DockerContainer):

loop
sleep for half a second
if some file exists, break
go to loop
execute the container image's original entrypoint (specific to the type of image the container is intended to be used with, for example: in mysql: docker image inspect mysql:8-oracle | jq '.[].Config.Entrypoint | .[]' -r prints docker-entrypoint.sh)

and then after you start the container, you can copy the files into it asynchronously from python, do whatever you have to do, and then create the file you are using an IPC signal in step 3 above.

you can see this implementation in the kafka container

OverkillGuy commented 3 months ago

I'll do anything that makes this feature more maintainable. Just want to make sure I understand your message.

The implementation provided here + mysql is indeed "racing" between our container.put_archive() command vs the container.start()'s entrypoint hoping that we can transfer the files before the mysql/postgres entrypoint hasn't finished yet.

That's bad because unreliable, on principle, as depending on winning a race is a recipe for hard to pin down bugs. But I expect worked fine here (in practice) because the start-up time of these dbs is in the multiple seconds, plenty of time to copy across a folder, so we always "won that race", enough that the feature worked.

I do see that the proposed solution you posted is nicer, in that it avoids the race. Paraphrasing, it seems to override the entrypoint to replace it with our own waiter script, giving time for put_archive to complete, blocking the script until some "sentinel" files are created, marking time to continue by executing the original entrypoint.

I like that solution, and I think I should be able to port this over to here, maybe even in a way that can be moved to the core class? Will investigate.

OverkillGuy commented 3 months ago

And yes remove the transfer_seed and DbContainer "bits" too

alexanderankin commented 3 months ago

I think we are on the same page.

here is the specific code i was referring to:

https://github.com/testcontainers/testcontainers-python/blob/9d2ceb6a80782085ae20511343577333452bd6d4/modules/kafka/testcontainers/kafka/__init__.py#L78-L82

OverkillGuy commented 3 months ago

Okay I think I have an experimental version of the wait-for-sentinel system for mysql, to polish as PR soon, then use in postgres too.

Logic is mostly now in DbContainer class, nothing prevents moving it up to DockerContainer in general, but conceptual: what would the folder-transfer feature even mean for DockerContainer, when for DbContainer it's of interest to "mount" a folder.

One thing to note on the reference implementation: the Kafka solution linked is reusing the TC_START_SCRIPT path as both sentinel file to wait for presence of (regardless of content, if it exists we're done), and the content of that file is also original command, to re-launch after waiting via exec. But there's no particular reason to use that file path both as sentinel and as command to execute afterwards.

Instead I've locally got the inspect_image() part of the docker-py low-level API, to get to the original image command, and set that command from before. The most challenging part was to find a way to not bork the init, after change of start command = redirection of PID 1. I have to source the original entrypoint by hand, and relaunch original command. This is a little image-specific and may need a second pair of eyes to get generic enough.

So, yeah, I'll send it soon, but I thought some of these points were worth discussing now, not having to wait for the PR itself.

OverkillGuy commented 3 months ago

Okay, just sent back the rewritten solution per above thread, using (for now) a DbContainer-level command override, like Kafka does, and with per-image seed mount location, and startup command prefix.

It works fine for mysql, but the postgres image one exits (see the last commit's message). Seems that for mysql/postgres, the actual command (mysqld and postgres) needs the entrypoint script to be basically PID 1 or so, which is what the self.startup_command trick is for: source the original script, and run that main to set envvars properly.

I'm fully open to moving this dbcontainer-level thing up to DockerContainer, just not sure how that would conceptually fit.

Sending this for a look over, while I try to understand the exact issue around postgres entrypoint causing exits, to be fixed.

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 0% with 35 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@0ce4fec). Learn more about missing BASE report.

Files	Patch %	Lines
core/testcontainers/core/generic.py	0.00%	35 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #576 +/- ## ======================================= Coverage ? 72.75% ======================================= Files ? 11 Lines ? 613 Branches ? 87 ======================================= Hits ? 446 Misses ? 141 Partials ? 26 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.