peterldowns / pgtestdb

quickly run tests in their own temporary, isolated, postgres databases
MIT License
186 stars 14 forks source link

Close connection to base DB after cleanup #5

Closed ebabani closed 8 months ago

ebabani commented 10 months ago

When running multiple quick tests in parallel it's possible to get a connection failure due to too many clients connected.

peterldowns commented 10 months ago

Interesting, thank you for reporting the issue and for suggesting a fix. Can you explain any more about how you hit the problem and how it manifests? Any suggested way for me to try to reproduce the issue? Running multiple quick tests in parallel is the expected use case of this library and I haven't seen this problem myself.

I'll have time later in the week to look into this further and I'll use this PR as a starting point. Thank you again for writing in.

ebabani commented 10 months ago

Hi,

I pushed another change to the tests and docker compose. This issues only shows up when the number of parallel tests can potentially be higher than the max connections to the base DB.

I updated the docker compose to set max-connections to 100, and the parallel tests to run 50 tests in parallel.

Running without this change you should see tests failing due to too many connections.

In our code base we use Ginkgo for writing and running tests. After we switched to pgtestdb we noticed that running our ~200 DB related tests in parallel would hit our connection limit (100), but not when running in serial which led me to look at the cleanup code.

peterldowns commented 8 months ago

@ebabani hi, thanks for your patience. I've fixed the implementation of create() to only use a single database connection at any point in time — previously, as you pointed out, it used 2 simultaneous connections. I was able to verify the behavior by adding some time.Sleep calls in the library and then connecting to the test database while running some tests:

Now, when you call `pgtestdb.New()` or `pgtestdb.Custom()`, pgtestdb connects, ensures the template exists, creates the instance db, then cleans everything up like before, but only uses a single connection to the server at any point in time. * Here, you see a single active connection to the `postgres` database to start (that's me connected in `psql`.) ```psql postgres=# SELECT datname, sum(numbackends) FROM pg_stat_database WHERE datname is not null group by 1; datname | sum ---------------------------------------------+----- postgres | 1 template0 | 0 template1 | 0 ``` * I start running a test. `pgtestdb` connects to the `postgres` database as well, this is the `baseDB` in the code. It uses this connection to create the template database, run migrations, and then create the instance database for the specific test. ```psql postgres=# SELECT datname, sum(numbackends) FROM pg_stat_database WHERE datname is not null group by 1; datname | sum ---------------------------------------------+----- postgres | 2 template0 | 0 template1 | 0 ``` * `pgtestdb` connects to the instance database, then hands it to the test. this is the `db := pgtestdb.New(...)` connection. ```psql postgres=# SELECT datname, sum(numbackends) FROM pg_stat_database WHERE datname is not null group by 1; datname | sum -----------------------------------------------------------------+----- template1 | 0 testdb_tpl_ed8ae75db1176559951eadb85d6be0db_inst_8ed74b29d41d8c | 1 postgres | 1 testdb_tpl_ed8ae75db1176559951eadb85d6be0db | 0 template0 | 0 ``` * The test case finishes, and `pgtestdb` closes the connection to the instance, connects back to the `baseDB`, and deletes the instance. ```psql postgres=# SELECT datname, sum(numbackends) FROM pg_stat_database WHERE datname is not null group by 1; datname | sum ---------------------------------------------+----- postgres | 2 testdb_tpl_ed8ae75db1176559951eadb85d6be0db | 0 template0 | 0 template1 | 0 ``` * `pgtestdb` closes its connection, the test is entirely finished running. ```psql postgres=# SELECT datname, sum(numbackends) FROM pg_stat_database WHERE datname is not null group by 1; datname | sum ---------------------------------------------+----- postgres | 1 testdb_tpl_ed8ae75db1176559951eadb85d6be0db | 0 template0 | 0 template1 | 0 ```

Thanks again for reporting this bug. In my testing, re-connecting to the database adds some slight overhead per test, but it's fundamentally more correct, and the database connection time should be dominated by the time taken during each test so overall this is worth the performance impact because it allows you to double the amount of tests you are running at once!

I also updated the FAQ to explain the problem, show some example error messages, and provide some ideas for how to work around it.

peterldowns commented 8 months ago

^ github actions is having some issues right now but once they're running again, I'll go ahead and merge this.

ebabani commented 7 months ago

Awesome, thanks for the update and the new release. I'll try to update sometimes this week. Thanks for maintaining this project, has been really helpful in parallelizing and speeding up our tests.

peterldowns commented 7 months ago

@ebabani of course, you're welcome. Thank you for using it. Please report any further bugs or questions, I'm happy to make improvements.