spring-cloud / spring-cloud-dataflow

A microservices-based Streaming and Batch data processing in Cloud Foundry and Kubernetes
https://dataflow.spring.io
Apache License 2.0
1.1k stars 578 forks source link

Getting deadlock error while executing a task #5934

Open akshaylad26 opened 6 days ago

akshaylad26 commented 6 days ago

Description: We are getting intermittent deadlock issue while executing tasks. We observed that this usually happens when our execution cleanup job is running in parallel. Below is the stacktrace. We are unable to reproduce this issue in lower environments, but it is frequently happening in client environments. Can you please check? If any details required, please let me know, I will extract it.

Release versions: We are using 2.11.4 release version.

org.springframework.dao.CannotAcquireLockException: PreparedStatementCallback; SQL [INSERT INTO BOOT3_BATCH_JOB_EXECUTION(JOB_EXECUTION_ID, JOB_INSTANCE_ID, START_TIME, END_TIME, STATUS, EXIT_CODE, EXIT_MESSAGE, VERSION, CREATE_TIME, LAST_UPDATED) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ]; Deadlock found when trying to get lock; try restarting transaction at org.springframework.jdbc.support.SQLExceptionSubclassTranslator.doTranslate(SQLExceptionSubclassTranslator.java:78) at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73) at org.springframework.jdbc.core.JdbcTemplate.translateException(JdbcTemplate.java:1572) at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:667) at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:960) at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:1015) at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:1020) at org.springframework.batch.core.repository.dao.JdbcJobExecutionDao.saveJobExecution(JdbcJobExecutionDao.java:243) at org.springframework.batch.core.repository.support.SimpleJobRepository.createJobExecution(SimpleJobRepository.java:178) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:196) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123) at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:391) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) at org.springframework.batch.core.repository.support.AbstractJobRepositoryFactoryBean.lambda$getObject$0(AbstractJobRepositoryFactoryBean.java:204) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:244) at jdk.proxy2/jdk.proxy2.$Proxy94.createJobExecution(Unknown Source) at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:145) at org.springframework.batch.core.launch.support.TaskExecutorJobLauncher.run(TaskExecutorJobLauncher.java:59) at org.springframework.cloud.task.batch.handler.TaskJobLauncherApplicationRunner.execute(TaskJobLauncherApplicationRunner.java:147) at org.springframework.boot.autoconfigure.batch.JobLauncherApplicationRunner.executeLocalJobs(JobLauncherApplicationRunner.java:195) at org.springframework.boot.autoconfigure.batch.JobLauncherApplicationRunner.launchJobFromProperties(JobLauncherApplicationRunner.java:175) at org.springframework.cloud.task.batch.handler.TaskJobLauncherApplicationRunner.run(TaskJobLauncherApplicationRunner.java:112) at org.springframework.boot.autoconfigure.batch.JobLauncherApplicationRunner.run(JobLauncherApplicationRunner.java:165) at org.springframework.boot.SpringApplication.lambda$callRunner$4(SpringApplication.java:771) at org.springframework.util.function.ThrowingConsumer$1.acceptWithException(ThrowingConsumer.java:83) at org.springframework.util.function.ThrowingConsumer.accept(ThrowingConsumer.java:60) at org.springframework.util.function.ThrowingConsumer$1.accept(ThrowingConsumer.java:88) at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:783) at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:771) at org.springframework.boot.SpringApplication.lambda$callRunners$3(SpringApplication.java:759) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.base/java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:357) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:510) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:759) at org.springframework.boot.SpringApplication.run(SpringApplication.java:332) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1327) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1316) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49) at org.springframework.boot.loader.Launcher.launch(Launcher.java:95) at org.springframework.boot.loader.Launcher.launch(Launcher.java:58) at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:65) Caused by: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:124) at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:916) at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1061) at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1009) at com.mysql.cj.jdbc.ClientPreparedStatement.executeLargeUpdate(ClientPreparedStatement.java:1320) at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdate(ClientPreparedStatement.java:994) at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61) at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java) at org.springframework.jdbc.core.JdbcTemplate.lambda$update$2(JdbcTemplate.java:965) at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:648) ... 55 common frames omitted

corneil commented 6 days ago

@akshaylad26 Is this happening in a Batch Task application that is launched by Spring Cloud Data Flow or using a Schedule?

corneil commented 6 days ago

@akshaylad26 Can you provide more information on your cleanup job?

cppwfs commented 6 days ago

@akshaylad26 Please provide the version of MySql and the platform you are launching the applications (Kubernetes, local, Cloud Foundry). Thanks!

akshaylad26 commented 6 days ago

We are launching using Kubernetes & MySQL version is 8.0.33 Both cleanup job & batch task application are scheduled operations. Cleanup job executes every 6 hours whereas task is scheduled every hour. In Cleanup job we get executions using /tasks/executions endpoint in a paginated way sorted by task execution id in ascending order. Then we use /task/executions/{list of executionIds}?action=REMOVE_DATA,CLEANUP&schemaTarget=boot3 endpoint for cleaning up older executions. We follow this way because normal cleanup API provided by SCDF doesn't clean executions without any start/end date.

Let me know if this helps.

cppwfs commented 6 days ago

What transaction isolation level are you using for your batch application(s)?

akshaylad26 commented 6 days ago

We use default isolation level. We use hikari datasource & dont override it.

cppwfs commented 6 days ago

You may wish to consider setting the transaction isolation level of your apps to something like ISOLATION_REPEATABLE_READ as discussed in Spring Batch's Javadoc.

akshaylad26 commented 6 days ago

sure will give it a try. Thanks!

akshaylad26 commented 5 days ago

Creating new transaction with name [org.springframework.cloud.dataflow.server.service.LauncherInitializationService.initialize]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT

I see this line in SCDF server DEBUG logs as well. Does this also need to be updated to ISOLATION_REPEATABLE_READ?

cppwfs commented 5 days ago

Thank you for your update! For now SCDF needs to keep its current transaction isolation level.

akshaylad26 commented 5 days ago

I just checked in our Database, its default isolation level itself is REPEATABLE_READ.

image

cppwfs commented 4 days ago

Understood. Not sure that necessarily affects the Batch apps in that it sets its own default.