zonkyio / embedded-database-spring-test

A library for creating isolated embedded databases for Spring-powered integration tests.
Apache License 2.0
400 stars 37 forks source link

Question: User creation in a Flyway SQL migration #276

Open lobodpav opened 1 month ago

lobodpav commented 1 month ago

The background

In our application, we have a versioned Flyway migration script that creates a user in all our environments using

CREATE USER a_new_user PASSWORD 'a_new_password'

When we run our integration tests using

@AutoConfigureEmbeddedDatabase(provider = AutoConfigureEmbeddedDatabase.DatabaseProvider.ZONKY, refresh = AutoConfigureEmbeddedDatabase.RefreshMode.AFTER_EACH_TEST_METHOD)

all tests after the first one fail on

14:39:02.278 [postgres:pid(18371)] INFO  i.z.t.d.p.embedded.EmbeddedPostgres - 2024-08-01 14:39:02.278 CEST [18428] ERROR:  role "a_new_user" already exists
14:39:02.279 [postgres:pid(18371)] INFO  i.z.t.d.p.embedded.EmbeddedPostgres -  CREATE USER a_new_user PASSWORD 'a_new_user'
14:39:02.282 [Test worker] WARN  o.s.test.context.TestContextManager - Caught exception while invoking 'beforeTestMethod' callback TestExecutionListener [ourApp.test.spock.IntegrationSpecExecutionListener] for test method [public void ourApp.ControllerIntegrationSpec.$spock_feature_0_0()] and test instance [ourApp.ControllerIntegrationSpec@2c4748af]
org.springframework.transaction.CannotCreateTransactionException: Could not open JPA EntityManager for transaction
Caused by: io.zonky.test.db.provider.ProviderException: Unexpected error when prefetching a database
Caused by: org.postgresql.util.PSQLException: ERROR: role "a_new_user" already exists

When debugging the io.zonky.test.db.context.DefaultDatabaseContext#refreshDatabase, the testPreparers contain loads of queries, with the CREATE USER a_new_user PASSWORD 'a_new_password' being one of those.

The question

Is it possible to somehow exclude user creation from becoming part of the template that is restored?

I am still unsure what the root cause is. I thought that the user in the template was restored to a new database where it can't conflict. Obviously, this is not the case.

tomix26 commented 1 month ago

Hi Pavel, long time no see, I hope you're doing well.

Regarding the first part of your question, users and roles in a PostgreSQL database are shared objects. So in certain cases where Flyway migrations need to be processed multiple times, this error can occur. To prevent this problem, it's necessary to use "CREATE ... IF NOT EXISTS" statements for shared objects. In cases where this type of statement is not supported, you need to use procedures, see the example below:

DO
$$
BEGIN
  IF NOT EXISTS (SELECT * FROM pg_user WHERE usename = 'a_new_user') THEN
     CREATE USER a_new_user PASSWORD 'a_new_password';
  END IF;
END
$$;

As for the second part of the question, what exactly do you mean by "loads of queries" appearing in testPreparers? What type of preparer is specifically present here? Because the only one that can contain SQL queries directly is ReplayableDatabasePreparer, but that should only be in corePreparers, not in testPreparers. Moreover, if it's a Flyway migration, it should be processed using MigrateFlywayDatabasePreparer, which provides much higher performance.

In essence, corePreparers should contain migrations that run during Spring context initialization, which are common to all tests and easily cacheable. While testPreparers contain migrations that are common only to one test class.

This brings me to the question, how are Flyway migrations being executed? Through a Flyway bean as part of Spring context initialization or within the test itself? Are you using any annotations on the test class that generate SQL queries? For example, @FlywayTest, Spring's @Sql annotation, or any other?

It would be good to look at these details more closely so we can better identify the cause of the problem.

lobodpav commented 1 month ago

Hi Tomáš. It's been some time since we last met! Good to talk to you again and I hope you're well too.

A little bit of praise - your framework is amazing; we love it and use it every day.

This brings me to the question, how are Flyway migrations being executed?

We run our integration tests using @SpringBootApplication and we neither use @FlywayTest nor @Sql annotations - each test generates the data it needs using code.

what exactly do you mean by "loads of queries" appearing in testPreparers? What type of preparer is specifically present here?

The corePreparers are completely empty whereas the testPreparers have got around 5k records in them, most of these being Flyway Migration queries, and some similar to

Record{thisId=connection4, methodName=prepareStatement, arguments=[Record{thisId=connection4, methodName=prepareStatement, arguments=[SELECT COUNT(*) FROM pg_namespace WHERE nspname=?], resultId=preparedStatement116}

Regarding the first part of your question, users and roles in a PostgreSQL database are shared objects. So in certain cases where Flyway migrations need to be processed multiple times, this error can occur.

I am wondering what would cause the Flyway script to be processed multiple times. Did you mean repeatable migrations? If so, we use repeatable migrations only for DML queries related to table content.

In versioned migrations, we never use the IF NOT EXIST concept and altering the migrations only because of failing tests would be a workaround.

I probably need to better understand how the database reset and the preparers work, to find out the root cause to implement a proper fix.

tomix26 commented 1 month ago

the testPreparers have got around 5k records

Do you mean that testPreparers contains a list of 5k preparers, or that there's only one preparer but with 5k records? I assume it's the latter and that the preparer type is ReplayableDatabasePreparer, is that correct?

I am wondering what would cause the Flyway script to be processed multiple times. Did you mean repeatable migrations?

No, I'm referring to versioned Flyway migrations. This can simply occur when a specific test configuration leads to the initialization of multiple Spring contexts, where each context processes its own instance of the Flyway bean. The target databases for different Spring contexts are of course different, but shared objects (like users or roles) can pose a problem in such cases.

I probably need to better understand how the database reset and the preparers work

As I mentioned in my previous comment, at the start of the build during Spring context initialization, database operations are converted to core preparers. These are then collected after context initialization and used to prepare a template database from which independent databases for individual tests are then created. These preparers also serve as cache keys, so if different tests refer to the same core preparers, a database is created for them using the template database (a very fast operation - essentially binary cloning of databases supported by Postgres). If the preparers don't match for some reason (typically due to different Spring or Flyway configurations), it's necessary to create a new database from scratch for that test and process all Flyway migrations again, which may cause complications with shared objects and also takes much longer.

In versioned migrations, we never use the IF NOT EXIST concept and altering the migrations only because of failing tests would be a workaround

In case all tests have the same Spring and Flyway configuration and you perform database reset only using RefreshMode.AFTER_EACH_TEST_METHOD (not using the @FlywayTest annotation), all tests should always have identical core preparers and there should be no need to perform this workaround.

The corePreparers are completely empty

This is definitely wrong. In such case, the library cannot function optimally because database caching doesn't work as it should. There are two things that are wrong here:

1. ReplayableDatabasePreparer in testPreparers

This type of preparer should under all circumstances be only in corePreparers. I've been going through the code and I really don't know how it's possible that it got into testPreparers. In my opinion, it simply shouldn't be possible. Do you have any customizations in your project that could cause this?

If not, for further investigation I would need you to enable trace logging for the package io.zonky.test.db and provide me the log. You can send it to my email tomix26@gmail.com if you're concerned that there might be potentially sensitive information.

Another thing that would help me is adding a breakpoint to the DefaultDatabaseContext#apply method on line 208: testPreparers.add(preparer); with the condition preparer instanceof ReplayableDatabasePreparer. This should help us trace how the records are getting into testPreparers.

2. Presence of ReplayableDatabasePreparer instead of MigrateFlywayDatabasePreparer

This is probably caused by not having Flyway objects exposed as Spring beans in the Spring context. This library needs to have access to Flyway beans to apply certain AOP techniques for everything to work as it should. Otherwise, ReplayableDatabasePreparer is used, which is a fallback method where everything should still work, but slower.


I guess that fixing the second point will likely "solve" the first issue as well, as it would result in using a MigrateFlywayDatabasePreparer instead of the ReplayableDatabasePreparer. However, if this isn't caused by any custom database context handling in your project, it would still be valuable to investigate the root cause of the first issue, as it could potentially affect other users of the library.

lobodpav commented 1 month ago

A few notes:

  1. We do not modify any context in our tests, just using the @AutoConfigureEmbeddedDatabase annotation.
  2. A unconditional breakpoint at the beginning of the io.zonky.test.db.context.DefaultDatabaseContext#apply did not stop the application during debugging at all. A breakpoint in the io.zonky.test.db.context.DefaultDatabaseContext#refreshDatabase will stop during test debugging.
    1. The first stop shown empty preparers.
    2. All follow up stops revealed a single ReplayableDatabasePreparerImpl{recordDataSize=3472, estimatedDuration=1629} in corePreparers.
    3. So either I'm out of my mind or I was really tired the last time during debugging.
  3. I have sent you a TRACE log defined for the io.zonky.test package.

It seems it might be 100 times faster to jump on a live-session debugging call, because something seems to be way off.

tomix26 commented 1 month ago

Although I expected you to send the entire log, it doesn't matter in the end. Because if there's only one ReplayableDatabasePreparerImpl and it's stored in corePreparers, then everything works as expected.

The occurrence of the error _org.postgresql.util.PSQLException: ERROR: role "a_newuser" already exists is related to how ReplayableDatabasePreparer works. First, the Spring context starts with an empty database. Then, SQL queries performed during context startup are captured into ReplayableDatabasePreparer. Finally, the preparer is used to create a template database. Only after these steps it is possible to clone databases and avoid further execution of SQL queries.

That's simply how ReplayableDatabasePreparer works. It's not optimal, doesn't provide maximum performance, which is why it's only a fallback solution. To achieve maximum performance and prevent unnecessary repeated SQL query calls, you need to get MigrateFlywayDatabasePreparer working. As I mentioned in my previous comment, to get it working, you need to adjust your Spring context configuration so that Flyway beans exist in it, allowing the library to load and process them.

To clarify what I mean, in your code you probably have Flyway migration execution implemented something like this:

@Bean
public FlywayMigrationInitializer flywayInitializer(FlywayMigrationStrategy migrationStrategy) {
    FluentConfiguration config = Flyway.configure();
    // ...
    Flyway flyway = config.load();
    return new FlywayMigrationInitializer(flyway, migrationStrategy);
}

While for the library to work optimally, the configuration should look like this. In other words, the Flyway instance must be exposed as a Spring bean (of course, there can be more than one Flyway bean if needed):

@Bean
public Flyway flyway() {
    FluentConfiguration config = Flyway.configure();
    // ...
    return config.load();
}

@Bean
public FlywayMigrationInitializer flywayInitializer(Flyway flyway, FlywayMigrationStrategy migrationStrategy) {
    return new FlywayMigrationInitializer(flyway, migrationStrategy);
}

I hope it's clear enough. If not, we can schedule an online meeting to discuss it further. However, I believe that if you follow the steps above, everything should start working properly.

lobodpav commented 1 month ago

Thanks for the very detailed guide. However, our setup is different from what you expected.

We have a single flyway bean exposed. This one manages tenant-agnostic migrations:

@Bean
fun commonSqlFlyway(basicFluentConfiguration: FluentConfiguration): Flyway

The commonSqlFlyway bean is used by the CommonFlywaySqlMigrationExecutor component that executes the migrations in a @PostConstruct.

@Component("commonFlywaySqlMigrationExecutor")
class CommonFlywaySqlMigrationExecutor(
    private val commonSqlFlyway: Flyway,
) {

    @PostConstruct
    fun migrate() {
        commonSqlFlyway.migrate()
    }
}

Once the common migrations are completed, the TenantFlywaySqlMigrationExecutor kicks in its @PostConstruct. Flyway instances for each tenant are created on the fly and migrations are executed. We instantiate the tenant-specific Flyway instances manually because we need to pass a bunch of placeholders to Flyway and the values are taken from the database.

@Component
@DependsOn("commonFlywaySqlMigrationExecutor")
class TenantFlywaySqlMigrationExecutor(
    private val tenantSqlFluentConfiguration: FluentConfiguration,
    private val tenantRepository: TenantRepository,
) {

    @PostConstruct
    fun migrate() {
        tenantRepository.findAll().forEach {
            createConcreteTenantFlyway(it).migrate()
        }
    }

    private fun createConcreteTenantFlyway(tenantEntity: TenantEntity): Flyway =
        FluentConfiguration().configuration(tenantSqlFluentConfiguration)
            .schemas(tenantEntity.schema)
            .placeholders(mapOf(
                "tenant_id" to tenantEntity.id.toString(),
                etc.
            ))
            .load()
}

Given the above, I am not sure If we can achieve the setup you described.

tomix26 commented 1 month ago

I think that in your case, it might be possible to use the following solution:

@Component
@DependsOn("commonFlywaySqlMigrationExecutor")
class TenantFlywaySqlMigrationExecutor(
    private val tenantSqlFluentConfiguration: FluentConfiguration,
    private val tenantRepository: TenantRepository,
    private val flywayDatabaseExtension: FlywayDatabaseExtension,
) {

    @PostConstruct
    fun migrate() {
        tenantRepository.findAll().forEach {
            flywayDatabaseExtension.postProcessBeforeInitialization(createConcreteTenantFlyway(it), null).migrate()
        }
    }

    private fun createConcreteTenantFlyway(tenantEntity: TenantEntity): Flyway =
        FluentConfiguration().configuration(tenantSqlFluentConfiguration)
            .schemas(tenantEntity.schema)
            .placeholders(mapOf(
                "tenant_id" to tenantEntity.id.toString(),
                etc.
            ))
            .load()
}

However, I've never tested it, so I'm not sure if it works, and it's a little hacky solution.

lobodpav commented 1 month ago

Thanks a lot for all the details!

I found the root cause but I'm not yet sure how to fix it other than in your framework.

What happens in our app is that the FlywayDatabaseExtension is created after the commonSqlFlyway bean. That is why the FlywayDatabaseExtension does not catch the Flyway bean creation and hence the FlywayDatabasePreparer is not used.

I have created an own post processor and it behaves exactly the same way - it is registered after the Flyway bean.

@Component
class LoggingBeanPostProcessor : BeanPostProcessor {

    private companion object : Logger()

    init { log.info("!!!!!! LoggingBeanPostProcessor created") }

    override fun postProcessBeforeInitialization(bean: Any, beanName: String): Any? {
        if (bean is Flyway) log.info("!!!!!! Hooray - found a Flyway bean: '$beanName'")
        return bean
    }
}

However, when the LoggingBeanPostProcessor implements PriorityOrdered with override fun getOrder(): Int = Int.MIN_VALUE, the post-processor correctly captures the Flyway bean creation.

It's weird that the LoggingBeanPostProcessor captures the creation of the CommonFlywaySqlMigrationExecutor bean but not the Flyway bean itself. It seems like some higher-priority bean depends on the Flyway bean in a way it was prioritised over the post-processor itself.

In the actuator, I can see that these beans depend on commonSqlFlyway:

Just for the record, we use Java 21, Spring Boot 3.3.0 along with Flyway 10.

tomix26 commented 1 month ago

Implementing some form of ordering for processors certainly makes sense to me. I would prefer Ordered over PriorityOrdered, as the latter is typically used for highly technical internal processors. I'll think about it a little more and prepare a pull request.

However, even without using the Ordered interface, all processors (including those without the Ordered interface) should be registered before the initialization of all standard singleton beans. So if your Flyway bean is initializing earlier, its initialization is likely being forced by another processor with higher priority or in some other non-standard way.

lobodpav commented 1 month ago

So if your Flyway bean is initializing earlier, its initialization is likely being forced by another processor with higher priority or in some other non-standard way.

After some investigation, I have found out that we have a bean post-processor that auto-wires a JPA repository that enforces Flyway bean creation earlier than the FlywayDatabaseExtension is instantiated.

The way I solved it was to mark the auto-wired repository with @Lazy to delay the Flyway bean initialisation later until used. With @Lazy in place, the FlywayDatabasePreparer appeared on the corePreparers in the DefaultDatabaseContext and the tests failed no more.

Thanks!!!