zonkyio / embedded-postgres

Java embedded PostgreSQL component for testing
Apache License 2.0
344 stars 43 forks source link

Don't allow multiple initdb processes on OSX. #42

Open jameshilliard opened 3 years ago

jameshilliard commented 3 years ago

Fixes #157:

[initdb:pid(95804)] INFO  i.z.t.d.p.embedded.EmbeddedPostgres - 2020-09-17 12:59:51.996 MDT [95916] DETAIL:  Failed system call was shmget(key=5432002, size=56, 03600).

@mdavydau You might be interested in this.

This issue only seems to appear when running parallel tests where multiple initdb processes could be active.

From my understanding the reason this error happens is because initdb does shared memory allocation tests to determine the appropriate value for the shared_buffers config option, however on OSX these tests themselves allocate a high enough percentage of the total default usable system shared memory that having more than one initdb process results in a very high likelihood of failure due to the processes allocating too much memory for concurrent shared memory tests.

tomix26 commented 3 years ago

I encountered a similar problem that was caused by the fact that postgres processes were not terminated properly if a fatal error occurred during the test execution. Otherwise, everything works fine for me, even when I am running multiple tests at once. So could you please check the postgres processes?

There is a related issue: https://github.com/zonkyio/embedded-database-spring-test/issues/105

jameshilliard commented 3 years ago

I encountered a similar problem that was caused by the fact that postgres processes were not terminated properly if a fatal error occurred during the test execution.

I thought I fixed that bug in #39, I haven't seen it after updating to the latest release.

Otherwise, everything works fine for me, even when I am running multiple tests at once.

I see this issue when running tests on a project with a gradle test worker for every virtual core, as my macbook pro has 8 cores and 16 virtual cores this results in a very large number of initdb and postgresql processes being launched simultaniously which greatly increases the chances of hitting the conditions necessary to reproduce this bug(multiple initdb processes running at the same time).

My test suite for this project has a very large amount of tests using postgresql as well which is also required to reproduce this from my understanding.

This is the gradle option I'm using to spin up the parallel test workers:

maxParallelForks = Runtime.runtime.availableProcessors()

So could you please check the postgres processes?

I've checked that already and it is not the cause, I can reproduce this bug very reliably(I hit it effectively 100% of the time in one of my projects on my macbook pro) and confirmed that this change fully fixes the test failures.

tomix26 commented 3 years ago

I thought I fixed that bug in #39, I haven't seen it after updating to the latest release.

I thought it too, but when I checked it out, there were still some not properly terminated processes.

I see this issue when running tests on a project with a gradle test worker for every virtual core, as my macbook pro has 8 cores and 16 virtual cores this results in a very large number of initdb and postgresql processes being launched simultaniously which greatly increases the chances of hitting the conditions necessary to reproduce this bug(multiple initdb processes running at the same time).

My test suite for this project has a very large amount of tests using postgresql as well which is also required to reproduce this from my understanding.

This is the gradle option I'm using to spin up the parallel test workers:

maxParallelForks = Runtime.runtime.availableProcessors()

So could you please check the postgres processes?

I've checked that already and it is not the cause, I can reproduce this bug very reliably(I hit it effectively 100% of the time in one of my projects on my macbook pro) and confirmed that this change fully fixes the test failures.

Could you please provide some simple reproducer?

jameshilliard commented 3 years ago

Could you please provide some simple reproducer?

Pushed a reproducer and an attempt at covering additional conditions discovered that trigger this, the reproducer is pretty aggressive and I haven't been able to fix all the issues so far.