Closed klopfdreh closed 10 months ago
This query seems to be slow as it accesses several views:
SELECT *
FROM (SELECT E.JOB_EXECUTION_ID as JOB_EXECUTION_ID,
E.START_TIME as START_TIME,
E.END_TIME as END_TIME,
E.STATUS as STATUS,
E.EXIT_CODE as EXIT_CODE,
E.EXIT_MESSAGE as EXIT_MESSAGE,
E.CREATE_TIME as CREATE_TIME,
E.LAST_UPDATED as LAST_UPDATED,
E.VERSION as VERSION,
I.JOB_INSTANCE_ID as JOB_INSTANCE_ID,
I.JOB_NAME as JOB_NAME,
T.TASK_EXECUTION_ID as TASK_EXECUTION_ID,
E.SCHEMA_TARGET as SCHEMA_TARGET,
(SELECT COUNT(*)
FROM AGGREGATE_STEP_EXECUTION S
WHERE S.JOB_EXECUTION_ID = E.JOB_EXECUTION_ID
AND S.SCHEMA_TARGET = E.SCHEMA_TARGET) as STEP_COUNT
FROM AGGREGATE_JOB_INSTANCE I
JOIN AGGREGATE_JOB_EXECUTION E
ON I.JOB_INSTANCE_ID = E.JOB_INSTANCE_ID AND I.SCHEMA_TARGET = E.SCHEMA_TARGET
JOIN AGGREGATE_TASK_BATCH B
ON E.JOB_EXECUTION_ID = B.JOB_EXECUTION_ID AND E.SCHEMA_TARGET = B.SCHEMA_TARGET
JOIN AGGREGATE_TASK_EXECUTION T
ON B.TASK_EXECUTION_ID = T.TASK_EXECUTION_ID AND B.SCHEMA_TARGET = T.SCHEMA_TARGET
ORDER BY E.JOB_EXECUTION_ID DESC)
WHERE ROWNUM <= 20
For 35361 job execution entries when I remove ORDER BY E.JOB_EXECUTION_ID DESC
the query execution time is increased from 2s501ms
to 171ms
maybe there is an index missing?
Also you can't navigate through the job execution pages anymore.
org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT E.JOB_EXECUTION_ID FROM (SELECT E.JOB_EXECUTION_ID, ROWNUM as TMP_ROW_NUM FROM (SELECT E.JOB_EXECUTION_ID FROM AGGREGATE_JOB_INSTANCE I JOIN AGGREGATE_JOB_EXECUTION E ON I.JOB_INSTANCE_ID=E.JOB_INSTANCE_ID AND I.SCHEMA_TARGET=E.SCHEMA_TARGET JOIN AGGREGATE_TASK_BATCH B ON E.JOB_EXECUTION_ID = B.JOB_EXECUTION_ID AND E.SCHEMA_TARGET = B.SCHEMA_TARGET JOIN AGGREGATE_TASK_EXECUTION T ON B.TASK_EXECUTION_ID = T.TASK_EXECUTION_ID AND B.SCHEMA_TARGET = T.SCHEMA_TARGET ORDER BY E.JOB_EXECUTION_ID DESC)) WHERE TMP_ROW_NUM = 20]; nested exception is java.sql.SQLSyntaxErrorException: ORA-00904: \"E\".\"JOB_EXECUTION_ID\": ungültige ID
Side note: We have around 500k job executions in our database as of now. With this amount the job executions are not displayed anylonger.
I did some tests and tried out the following:
SELECT *
FROM (SELECT E.JOB_EXECUTION_ID as JOB_EXECUTION_ID,
E.START_TIME as START_TIME,
E.END_TIME as END_TIME,
E.STATUS as STATUS,
E.EXIT_CODE as EXIT_CODE,
E.EXIT_MESSAGE as EXIT_MESSAGE,
E.CREATE_TIME as CREATE_TIME,
E.LAST_UPDATED as LAST_UPDATED,
E.VERSION as VERSION,
I.JOB_INSTANCE_ID as JOB_INSTANCE_ID,
I.JOB_NAME as JOB_NAME,
T.TASK_EXECUTION_ID as TASK_EXECUTION_ID,
E.SCHEMA_TARGET as SCHEMA_TARGET,
(SELECT COUNT(*)
FROM AGGREGATE_STEP_EXECUTION S
WHERE S.JOB_EXECUTION_ID = E.JOB_EXECUTION_ID
AND S.SCHEMA_TARGET = E.SCHEMA_TARGET) as STEP_COUNT,
ROW_NUMBER() OVER (PARTITION BY E.JOB_EXECUTION_ID ORDER BY E.JOB_EXECUTION_ID desc) as rn
FROM AGGREGATE_JOB_INSTANCE I
JOIN AGGREGATE_JOB_EXECUTION E
ON I.JOB_INSTANCE_ID = E.JOB_INSTANCE_ID AND I.SCHEMA_TARGET = E.SCHEMA_TARGET
JOIN AGGREGATE_TASK_BATCH B
ON E.JOB_EXECUTION_ID = B.JOB_EXECUTION_ID AND E.SCHEMA_TARGET = B.SCHEMA_TARGET
JOIN AGGREGATE_TASK_EXECUTION T
ON B.TASK_EXECUTION_ID = T.TASK_EXECUTION_ID AND B.SCHEMA_TARGET = T.SCHEMA_TARGET
ORDER BY E.JOB_EXECUTION_ID DESC)
where ROWNUM <= 20
by adding ROW_NUMBER() OVER (PARTITION BY E.JOB_EXECUTION_ID ORDER BY E.JOB_EXECUTION_ID desc) as rn
the performance was increased drastically
Before:
After:
@cppwfs / @corneil - I hope this helps with the SQL analysis. 😀
Hint this function is available for:
For the issue https://github.com/spring-cloud/spring-cloud-dataflow/issues/5524#issuecomment-1774833293 I enabled the logging and it seems that the JdbcAggregateJobQueryDao
is generating a paging query which is not valid:
{"timestamp":"2023-10-25T06:35:47.998+0200","level":"DEBUG","thread":"https-jsse-nio-8443-exec-3","logger":"org.springframework.cloud.dataflow.server.repository.JdbcAggregateJobQueryDao","message":"countJobExecutions:SELECT COUNT(1) from AGGREGATE_JOB_EXECUTION","context":"default"}
{"timestamp":"2023-10-25T06:35:47.998+0200","level":"DEBUG","thread":"https-jsse-nio-8443-exec-3","logger":"org.springframework.jdbc.core.JdbcTemplate","message":"Executing SQL query [SELECT COUNT(1) from AGGREGATE_JOB_EXECUTION]","context":"default"}
{"timestamp":"2023-10-25T06:35:48.000+0200","level":"DEBUG","thread":"https-jsse-nio-8443-exec-3","logger":"org.springframework.cloud.dataflow.server.repository.JdbcAggregateJobQueryDao","message":"queryJumpToItem:SELECT E.JOB_EXECUTION_ID FROM (SELECT E.JOB_EXECUTION_ID, ROWNUM as TMP_ROW_NUM FROM (SELECT E.JOB_EXECUTION_ID FROM AGGREGATE_JOB_INSTANCE I JOIN AGGREGATE_JOB_EXECUTION E ON I.JOB_INSTANCE_ID=E.JOB_INSTANCE_ID AND I.SCHEMA_TARGET=E.SCHEMA_TARGET JOIN AGGREGATE_TASK_BATCH B ON E.JOB_EXECUTION_ID = B.JOB_EXECUTION_ID AND E.SCHEMA_TARGET = B.SCHEMA_TARGET JOIN AGGREGATE_TASK_EXECUTION T ON B.TASK_EXECUTION_ID = T.TASK_EXECUTION_ID AND B.SCHEMA_TARGET = T.SCHEMA_TARGET ORDER BY E.JOB_EXECUTION_ID DESC)) WHERE TMP_ROW_NUM = 20:20:20:[]","context":"default"}
{"timestamp":"2023-10-25T06:35:48.000+0200","level":"DEBUG","thread":"https-jsse-nio-8443-exec-3","logger":"org.springframework.jdbc.core.JdbcTemplate","message":"Executing prepared SQL query","context":"default"}
{"timestamp":"2023-10-25T06:35:48.001+0200","level":"DEBUG","thread":"https-jsse-nio-8443-exec-3","logger":"org.springframework.jdbc.core.JdbcTemplate","message":"Executing prepared SQL statement [SELECT E.JOB_EXECUTION_ID FROM (SELECT E.JOB_EXECUTION_ID, ROWNUM as TMP_ROW_NUM FROM (SELECT E.JOB_EXECUTION_ID FROM AGGREGATE_JOB_INSTANCE I JOIN AGGREGATE_JOB_EXECUTION E ON I.JOB_INSTANCE_ID=E.JOB_INSTANCE_ID AND I.SCHEMA_TARGET=E.SCHEMA_TARGET JOIN AGGREGATE_TASK_BATCH B ON E.JOB_EXECUTION_ID = B.JOB_EXECUTION_ID AND E.SCHEMA_TARGET = B.SCHEMA_TARGET JOIN AGGREGATE_TASK_EXECUTION T ON B.TASK_EXECUTION_ID = T.TASK_EXECUTION_ID AND B.SCHEMA_TARGET = T.SCHEMA_TARGET ORDER BY E.JOB_EXECUTION_ID DESC)) WHERE TMP_ROW_NUM = 20]","context":"default"}
As you can see the query refers two times to E.JOB_EXECUTION_ID
which is not present in the outer select. I think this is caused because of the sortKeys
but I am not 100% sure.
For the performance I am trying out something like:
private PagingQueryProvider getPagingQueryProvider(String fields, String fromClause, String whereClause, Map<String, Order> sortKeys) throws Exception {
SqlPagingQueryProviderFactoryBean factory = new SqlPagingQueryProviderFactoryBean();
factory.setDataSource(dataSource);
fromClause = "AGGREGATE_JOB_INSTANCE I JOIN AGGREGATE_JOB_EXECUTION E ON I.JOB_INSTANCE_ID=E.JOB_INSTANCE_ID AND I.SCHEMA_TARGET=E.SCHEMA_TARGET" + (fromClause == null ? "" : " " + fromClause);
factory.setFromClause(fromClause);
if (fields == null) {
fields = FIELDS;
}
if (fields.contains("E.JOB_EXECUTION_ID")) {
Order order = sortKeys.get("E.JOB_EXECUTION_ID");
String orderString = Optional.ofNullable(order).map(orderKey -> orderKey == Order.DESCENDING ? "DESC" : "ASC").orElse("DESC");
fields += ", ROW_NUMBER() OVER (PARTITION BY E.JOB_EXECUTION_ID ORDER BY E.JOB_EXECUTION_ID " + orderString + ") as RN";
}
factory.setSelectClause(fields);
if (sortKeys.isEmpty()) {
sortKeys = Collections.singletonMap("E.JOB_EXECUTION_ID", Order.DESCENDING);
}
factory.setSortKeys(sortKeys);
factory.setWhereClause(whereClause);
return factory.getObject();
}
Good news - I fixed it.
Bad news - I guess it is an issue in Spring Batch 4
.
Here are all fixes I applied:
I applied the following lines in the JdbcAggregateJobQueryDao
as described in the previous comment.
if (fields.contains("E.JOB_EXECUTION_ID")) {
Order order = sortKeys.get("E.JOB_EXECUTION_ID");
String orderString = Optional.ofNullable(order).map(orderKey -> orderKey == Order.DESCENDING ? "DESC" : "ASC").orElse("DESC");
fields += ", ROW_NUMBER() OVER (PARTITION BY E.JOB_EXECUTION_ID ORDER BY E.JOB_EXECUTION_ID " + orderString + ") as RN";
}
In Spring Batch 4.x there is an issue in the OraclePagingQueryProvider
https://github.com/spring-projects/spring-batch/blob/4.3.x/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/database/support/OraclePagingQueryProvider.java#L50 getSortKeySelect()
is applied for generateRowNumSqlQueryWithNesting
as innerSelectClause
and outerSelectClause
which is wrong. outerSelectClause
should use the getSortKeysWithoutAliases()
and the keys should be joined in a helper method similar to getSortKeySelect()
In SqlPagingQueryUtils
https://github.com/spring-projects/spring-batch/blob/4.3.x/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/database/support/SqlPagingQueryUtils.java#L293 the method buildSortConditions
is using getSortKeys()
as well which is not working because the outer query does not have a reference to the qualifier of the join of the inner query. Here also provider.getSortKeysWithoutAliases().entrySet()
should be used.
Thanks @klopfdreh for all of the sleuthing and solutions. We will be taking a look soon and incorporating your findings. Much appreciated.
I split the other issue off into its own here https://github.com/spring-cloud/spring-cloud-dataflow/issues/5531
Hi @klopfdreh , getting back to this now.
On https://github.com/spring-cloud/spring-cloud-dataflow/issues/5524#issuecomment-1778630171, items 2 and 3 are for the related issue #5531, correct? You said good news is you fixed it. I can see how item 1 fixes the perf issue, but what did you do for 2 and 3?
Thanks
Hey @onobc,
On https://github.com/spring-cloud/spring-cloud-dataflow/issues/5524#issuecomment-1778630171, items 2 and 3 are for the related issue https://github.com/spring-cloud/spring-cloud-dataflow/issues/5531, correct?
yes this is correct - because of 2 and 3 the navigation was not possible
You said good news is you fixed it. I can see how item 1 fixes the perf issue,
It is related to Oracle and you can use the Oracle Optimizer to apply performance hints so that the queries based on that newly created query can be performed faster
, but what did you do for 2 and 3?
I described it without actual changes in the SCDF code as the changes are Spring Batch related. Because of this I pointed to the classes and wrote what I changed. Example for 2.:
- In Spring Batch 4.x there is an issue in the OraclePagingQueryProvider https://github.com/spring-projects/spring-batch/blob/4.3.x/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/database/support/OraclePagingQueryProvider.java#L50 getSortKeySelect() is applied for generateRowNumSqlQueryWithNesting as innerSelectClause and outerSelectClause which is wrong. outerSelectClause should use the getSortKeysWithoutAliases() and the keys should be joined in a helper method similar to getSortKeySelect()
Here are how the changes are applied:
@Override
public String generateJumpToItemQuery(int itemIndex, int pageSize) {
int page = itemIndex / pageSize;
int offset = (page * pageSize);
offset = offset == 0 ? 1 : offset;
String sortKeySelect = this.getSortKeySelect();
String sortKeyOuterSelect = this.getSortKeyOuterSelect();
return SqlPagingQueryUtils.generateRowNumSqlQueryWithNesting(this, sortKeySelect, sortKeyOuterSelect, false, "TMP_ROW_NUM = "
+ offset);
}
/* ...... */
private String getSortKeyOuterSelect() {
StringBuilder sql = new StringBuilder();
String prefix = "";
for (Map.Entry<String, Order> sortKey : this.getSortKeysWithoutAliases().entrySet()) {
sql.append(prefix);
prefix = ", ";
sql.append(sortKey.getKey());
}
return sql.toString();
}
/* ....... */
The issue here is that the outer query can't perform the sort keys with alias as the alias is only available in the inner select.
For 3. The changes are also written with words but no actual code changes but can be read the same way.
@onobc working like expected- great job 👍
@onobc working like expected- great job 👍
Are you kidding me!?, you did all the great sleuthing - I just implemented your suggestions 😸
It seems like the issue with Task and Task Execution is the same. Hasn't it been fixed yet? I'm currently using MariaDB
This update is now available in 2.11.3.
@cppwfs
It seems like there might be a similar issue even after upgrading to version 2.11.3. Any idea what could be the possible cause?
Description: Currently we are using SCDF 2.11.0 and about to upgrade to 2.11.1. The
JobExecutionThinController
is very slow and we are receiving a timeout on a normalGET
operations with a Oracle DB. It might be an issue with the upgrade to 2.11.x, because as of the changes there are several views introduced for example to getAGGREGATE_JOB_EXECUTION
andAGGREGATE_JOB_INSTANCE
. It might be an issue related to this.After this issue the "Job Executions" Grid stays empty.
Release versions: 2.11.x
Custom apps: N/A
Steps to reproduce: Produce a large amout of Job-Executions and just click on
Job executions
in the UI.Screenshots:
Additional context:
"Task Executions" is working as expected.