spring-projects / spring-batch

Spring Batch is a framework for writing batch applications using Java and Spring
http://projects.spring.io/spring-batch/
Apache License 2.0
2.68k stars 2.33k forks source link

SQL Operations should be retried rather than throwing CannotAcquireLockException. #4523

Open ramshers opened 8 months ago

ramshers commented 8 months ago

Bug description When concurrent instances of a Job do SQL operations on same table at same time, CannotAcquireLockException should not be thrown but instead the SQL operation should be retried.

Environment $ java -version java version "17.0.8" 2023-07-18 LTS Java(TM) SE Runtime Environment (build 17.0.8+9-LTS-211) Java HotSpot(TM) 64-Bit Server VM (build 17.0.8+9-LTS-211, mixed mode, sharing)

Spring Boot Version - 3.1.3 Spring Batch Version - 5.0.3 DB - Postgres - 13.x with Isolation level READ_COMMITTED

Steps to reproduce This is difficult to reproduce. As we can not for sure time two job instances to update the DB at same time at a precision level of milli or nano seconds.

Description When runing the same spring batch job concurrently(i.e multiple instances of the same job are started at same time) with different identifying parms I faced -

Caused by: org.springframework.dao.CannotAcquireLockException: PreparedStatementCallback; SQL [INSERT INTO BATCH_JOB_EXECUTION(JOB_EXECUTION_ID, JOB_INSTANCE_ID, START_TIME, END_TIME, STATUS, EXIT_CODE, EXIT_MESSAGE, VERSION, CREATE_TIME, LAST_UPDATED) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ]; ERROR: could not serialize access due to read/write dependencies among transactions Detail: Reason code: Canceled on identification as a pivot, during conflict in checking. Hint: The transaction might succeed if retried. at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:115) at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:70) at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:79) at org.springframework.jdbc.core.JdbcTemplate.translateException(JdbcTemplate.java:1580)

As the error above indicates this is due to Locking done by one job instance that blocks and then fails another job instance.

To resolve this, the Insert Operation (As far as I have explored.. this part of the code) has to be retried on facing CannotAcquireLockException rather than failing the whole job instance.

Similar approach should be taken when inserting in to other tables if there is possibility of facing CannotAcquireLockException.

Expected behavior To resolve this db insert/update/delete operations should not fail on facing CannotAcquireLockException. But instead should retry a couple of times till it succeeds.

Badii-kh commented 4 months ago

This problème is related to the default isolation level of the JobRepository (SERIALIZABLE). In my case, passing the isolation level to READ_COMMITTED resolve the issue.

spring.batch.jdbc.isolation-level-for-create: READ_COMMITTED