patka / cassandra-migration

Schema migration library for Cassandra
MIT License
154 stars 47 forks source link

Exception while trying to run multiple services connecting to same Cassandra #45

Open max-work opened 4 years ago

max-work commented 4 years ago

I am trying to run multiple services of same type connecting to a Cassandra cluster. Looks like we are hitting a race condition when running the scripts. The server fails to come up with the following exception. Please let me know if any one has faced the same issue.

The version of the software being used is 2.2.0

org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'migrationTask' defined in class path resource [org/cognitor/cassandra/migration/spring/CassandraMigrationAutoConfiguration.class]: Invocation of init method failed; nested exception is org.cognitor.cassandra.migration.MigrationException: Error during migration of script 2_Change-13418.cql while executing 'DROP TABLE conversation_message_subscriber;' at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1778) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:593) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:515) at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:320) at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:318) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199) at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:849) at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:877) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:549) at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:142) at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:775) at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:397) at org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1260) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1248) at Application.main(Application.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48) at org.springframework.boot.loader.Launcher.launch(Launcher.java:87) at org.springframework.boot.loader.Launcher.launch(Launcher.java:50) at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51)Caused by: org.cognitor.cassandra.migration.MigrationException: Error during migration of script 2_Change-13418.cql while executing 'DROP TABLE conversation_message_subscriber;' at org.cognitor.cassandra.migration.Database.execute(Database.java:187) at java.util.ArrayList.forEach(ArrayList.java:1257) at org.cognitor.cassandra.migration.MigrationTask.migrate(MigrationTask.java:52) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1903) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1846) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1774) ... 24 common frames omittedCaused by: com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table conversation_message_subscriber

patka commented 4 years ago

Hi,

I would recommend you try upgrading to version 2.3 as it should fix your problem. In 2.3 leader election was implemented which selects one of the hosts running the library to be the one that performs the migration instead of all of the hosts trying to do it at the same time.

Let me know if that solves your problem.

Best Patrick

max-work commented 4 years ago

Thanks Patrick. Will try that

max-work commented 4 years ago

Hi Patrick, Even the latest version gives the issue. Also the new issue I am seeing now is the migration runs successfully. But there are some tables missing in the key space.

patka commented 4 years ago

Hi,

this is really strange and so far you are the only person who is having this issue. Are you managing all tables with cassandra migration or is something else involved in the table migration? Are you able to produce a small spring project that reproduces the issue?

warmuuh commented 4 years ago

Hi. i am experiencing the same issue (race condition), on version 2.2.1_v4. because of v4, i cannot upgrade to 2.3. will you port the leader-selection change over to v4 branch as well?

patka commented 4 years ago

That is actually my plan but like everybody else I am currently working remotely and have two kids to manage as well. I will try to find some time to do this but I can currently not give you an ETA.

ysheela commented 2 years ago

Also noticed a similar issue even in 2.5.0_v4 i.e latest version.

One of the migration scripts was dropping a column. Locally we did not see an issue but in our QA environment, it seemed like more than one instance of the service was performing the migration. As a result the task was failing when another instance service was trying to perform migration simultaneously i.e. drop the same column.

Clarification, noticed in the code an instance of the service tries to take the lead on migration, we use an LWT transaction and consistency used is QUORUM shouldn't this be set to SERIAL? Wondering if this was the cause for the takeLeadOnMigrations not working as expected when multiple services are involved

patka commented 2 years ago

Hi @ysheela,

I will look into this in the next couple of days and get back to you. Maybe I can provide you with a version where the consistency level for the lead can be configured, so you can try again. As this is a race condition, I do not assume you have a reproduce-able test case?