zonkyio / embedded-postgres

Java embedded PostgreSQL component for testing
Apache License 2.0
344 stars 43 forks source link

PrepPipeline daemon thread shutdown #65

Open amumurst opened 3 years ago

amumurst commented 3 years ago

Make PreparedDbProvider AutoCloseable such that the internal background thread started in PrepPipeline can be shut down without needing to shut down the entire JVM.

This is useful for usage in build tools that run in interactive mode, eg scalas sbt. Here you typically run sbt then a set of test -> test -> test (running tests multiple times without shutting down the underlying JVM in between. The current implementation then keeps multiple versions of preparers running in the background since the daemon thread is not killed.

I am not too familiar with the internals here so there might be more to clean up, but I think this would work.

tomix26 commented 3 years ago

I'm afraid this cannot be changed that easily, because it's intended behavior. If there are multiple tests running in a sequence, it is desired to start only a single database cluster (postgres instance) for all these tests. Typically, most of the tests share the same preparer configuration, so in the result there should be only a single or very limited number of background threads.

If your experience is different, please double check what implementation of DatabasePreparer you are using and whether it has the equals and hashCode methods implemented properly. The default providers io.zonky.test.db.postgres.embedded.FlywayPreparer and io.zonky.test.db.postgres.embedded.LiquibasePreparer should be implemented correctly, so you can use them as reference.

To make it more clear, if I have the following test class and run it several times, it should always refer to the same database cluster. And only if I change the migration location parameter (preparer configuration), it will use a new database cluster.

public class FlywayPreparerTest {

    @Rule
    public PreparedDbRule db = EmbeddedPostgresRules.preparedDatabase(FlywayPreparer.forClasspathLocation("db/testing"));

    @Test
    public void test() throws Exception {
        ...
    }
}
amumurst commented 3 years ago

Okay so perhaps I can clarify further. I added the basics for running exactly the LiquibasePreparerTest already found in this repo from sbt here https://github.com/amumurst/embeddedpostgres-postgres-background-threads

Screenshot 2021-07-08 at 00 07 33

As you see sbt is here ran as a program and not a script (as typically mvn or gradle would be). This makes it not start a new JVM for every test run. Running ps aux | grep '[p]ostgres' -c to count the amount of postgres processes running between each test, we can see that it increases by 7 for every run.

(There is an option for sbt to fork a new JVM for each testrun, but that would mean much slower testing).

7 each time might not sound like much, but if I run ~test aka. incremental testing on codechange it quickly becomes a lot and typically eventually OOMs.

Right now the Thread internally in PrepPipeline creates databases in the nextDatabase queue until the JVM is shutdown and exposes no way of shutting it down. I am not sure why it seems to create 7 processes exactly every time though.

So what I am basically looking for is some method for shutting down these processes, and I thought killing the internal thread in PrepPipeline would be the way.

amumurst commented 3 years ago

Okay, so I attempted to do this with a hack (finding all threads containing cluster in the running JVM and interruputing them), but it does not remove the postgres processes. What I found is that these postgres processes are contained in from PrepPipeline::pg. This is of type EmbeddedPostgres and implements Closeable, but since PrepPipeline is never closed, the pg variable is also never shut down.

If I run an even worse hack with manually finding and running the shutdown hoooks added by EmbeddedPostgres::newCloserThread in the function EmbeddedPostgres::startPostmaster the postgres threads are removed.

    val field = Class.forName("java.lang.ApplicationShutdownHooks").getDeclaredField("hooks")
    field.setAccessible(true)
    val hooks =field.get(null)
    val threads: List[Thread] = hooks.asInstanceOf[java.util.Map[Thread, Thread]].asScala.keySet.toList
    val postgresThreads = threads.filter(thread => thread.getName.contains("postgres") && thread.getName.contains("closer"))
    postgresThreads.foreach(thread => Runtime.getRuntime.removeShutdownHook(thread))
    postgresThreads.foreach(thread => thread.run())
tomix26 commented 3 years ago

@amumurst Thanks a lot for the reproducer, it made it a lot easier for me to investigate the problem.

I have confirmed that the SBT server is indeed using only a single JVM process. However, the problem is that the server uses a custom class loader, which creates the same classes over and over again each time the project is run. So within the same JVM process, there are then duplicate classes that have no relation to each other. This makes a caching mechanism, which is used to store instances of the embedded databases, impossible to work properly. Because the caching mechanism depends on a static variable, and in this case the static variable refers to a new object every time the project is run.

So to solve this problem, we need to figure out how to implement a cache that persists data between multiple runs of the project that is run via the SBT server. Or alternatively, implement a special shutdown hook that will be called when the project is shutting down and not when the SBT server (the JVM process) is shutting down.