Closed kagkarlsson closed 1 year ago
Remaining to test against MSSQL. Verified to work with postgres using new generic method that is transaction-based.
This is a bit tricky to test since Testcontainers does not seem to work on M1 Macs and testing via Github Actions sporadically fails. I have done some local testing using AzureSQLEdge, and the lock-and-fetch strategy seem to be working. If anyone using MSSQL could try this branch out that would be helpful.
With a little luck this might fix the deadlock issues experienced using the default polling strategy. I have gotten the deadlock locally (AzureSQLEdge) using default polling, but not using lock-and-fetch.
Is it possible for you do help out verifying this change for MSSQL @rafaelhofmann ?
This is the query currently in use:
@Override
public String createGenericSelectForUpdateQuery(String tableName, int limit, String requiredAndCondition) {
return "select TOP "+limit+" * from " + tableName +
" WITH (readpast,rowlock) where picked = ? and execution_time <= ? " + requiredAndCondition +
" order by execution_time asc ";
}
I added with (readpast)
to the regular query as well for mssql (locally), and now I do not get the dead-lock issues @rafaelhofmann
Hi @kagkarlsson
I will test this new branch and check if we still have problems with the indexes.
I have enabled both polling-strategies for MSSQL in this branch. Both seem to run successfully on GA via MssqlCompatibilityTest
Additionally, I added hints to skip locked to default-query as well since deadlocks were causing sporadic test-failures (for MSSQL)
Hi @kagkarlsson,
Do you have a published snapshot version that we can use to test or should we build our own version?
Thanks
Currently you would need to build your own. Let me know if that is a problem.
I have built the branch and upgraded our ACPT system (including all the indexes that were deleted in the past) to the new release, but we are running into the same problem as before: A deadlock about once every hour, when we have a load of 300 tasks/min. I have been experimenting with a query that combines the SELECT and UPDATE in one, but I am not quite happy yet with the query hint for MSSQL:
String selectForUpdateQuery =
" UPDATE " + ctx.tableName +
" SET " + ctx.tableName + ".picked = ?, " +
" " + ctx.tableName + ".picked_by = ?, " +
" " + ctx.tableName + ".last_heartbeat = ?, " +
" " + ctx.tableName + ".version = " + ctx.tableName + ".version + 1 " +
" OUTPUT [inserted].* " +
" FROM ( " +
" SELECT TOP(?) ist2.task_name, ist2.task_instance " +
" FROM " + ctx.tableName + " ist2 WITH (ROWLOCK, READPAST) " +
" WHERE ist2.picked = ? AND ist2.execution_time <= ? " + unresolvedFilter.andCondition() +
" ORDER BY ist2.execution_time ASC " +
" ) AS st2 " +
" WHERE st2.task_name = " + ctx.tableName + ".task_name " +
" AND st2.task_instance = " + ctx.tableName + ".task_instance";
In theory this should do the same as the lock-and-fetch
query from PostgreSQL.
Ok, thanks. Did you try both lock-and-fetch
and fetch-and-lock-on-execute
? I think they should both work for MSSQL now. (lock-and-fetch
for MSSQL is select-for-update and update spanned by a transaction)
Your combined query looks interesting
We had deadlocks with the lock-and-fetch
strategy, but so far no errors with the default fetch
strategy after four hours of running. Will keep this running for a few more days to verify.
How has the fetch
strategy run?
The fetch
strategy runs without any deadlock or error for the last 30 days with all the recommended indexes.
Ok, that is good news. I was hoping for good news on the other strategy as well, but I will take 1 out of 2 :).
This means you are happy with the changes in the PR and would like to have it merged?
Great fix! Any ETA on release with this fix?
Good question, nothing blocking it from being released, just time I guess :). Hopefully I can release it within 1-2 weeks
🎉 This issue has been resolved in v13.0.0
(Release Notes)
Hi! I am currently evaluating db-scheduler for our use-cases and doing some code reviews in the process.
I see that the MariaDBJdbcCustomization
, now uses FOR UPDATE SKIP LOCKED
for SELECT-queries even though MariaDB only supports SKIP LOCKED
since MariaDB 10.6, according to the documentation: https://mariadb.com/kb/en/select/#skip-locked
I don't particularly care about older versions of MariaDB, but I just want to let you know that the versions 10.4 and 10.5 are technically still supported, according to https://mariadb.com/kb/en/mariadb-server-release-dates/. I see you already differentiate the versions of MySQL, so maybe you want to do the same for MariaDB.
Anyway, thank you for your work! I like the simplicity of db-scheduler and I hope we can use it to eventually replace Quartz, which seems to be completely dead by now.
I had some issue when testing the support in MySQL (or MariaDB) and limited time, so I opted for disabling the support until someone had the time to verify it works as expected. Feel free to fork and try it out.
@Override
public boolean supportsGenericLockAndFetch() {
// FIXLATER: fix syntax and enable it for versions of MariaDB that supports it
return false;
}
JdbcCustomizations
. (Groundwork necessary to supportSELECT FOR UPDATE .. SKIP LOCKED
and explicitLIMIT
for different databases.)JdbcCustomization
lock-and-fetch
aka.SELECT FOR UPDATE .. SKIP LOCKED
for MSSQL/SqlServer. However, testing has shown that lock-and-fetch is prone to deadlocks for SqlServer, so it is not recommended until that is understood/resolved (reproduce via disabled test inMssqlClusterTest
). A silver-lining is that a query-hint was also added tofetch-and-lock-on-execute
for SqlServer, and it appears this resolves issues with deadlocks for that (default) strategy.micro-jdbc
JdbcRunner
and refactor transaction-handling (see PR #4)micro-jdbc
dependencyFixes
264
337
348 ?
273 ?
Further work / fix later
Reminders
cc @kagkarlsson