spring-cloud / spring-cloud-dataflow

A microservices-based Streaming and Batch data processing in Cloud Foundry and Kubernetes
https://dataflow.spring.io
Apache License 2.0
1.11k stars 581 forks source link

Long duration time for REST List All task definition #5841

Closed szopal closed 3 months ago

szopal commented 3 months ago

In 2.11.2 I've noticed that getting list all task definition last very long. After analyze code, I think that I found reason - in class DefaultAggregateTaskExplorer:

TaskDefinitionController pass to getLatestTaskExecutionsByTaskNames all task definition as list - i.e. 100 definitions. But in method getLatestTaskExecutionsByTaskNames there is a loop where every element in a loop execute

        List<AggregateTaskExecution> taskExecutions = taskExplorer.getLatestTaskExecutionsByTaskNames(taskNames) 

but for whole list - not for one element:

taskExplorer.getLatestTaskExecutionsByTaskNames(taskNames) VS taskExplorer.getLatestTaskExecutionsByTaskNames(taskName)

    @Override
    public List<AggregateTaskExecution> getLatestTaskExecutionsByTaskNames(String... taskNames) {
        List<AggregateTaskExecution> result = new ArrayList<>();
        for (String taskName : taskNames) { // iterate over all task names
            SchemaVersionTarget target = aggregateExecutionSupport.findSchemaVersionTarget(taskName, taskDefinitionReader);
            String platformName = getPlatformName(taskName);
            Assert.notNull(target, "Expected to find SchemaVersionTarget for " + taskName);
            TaskExplorer taskExplorer = taskExplorers.get(target.getName());
            Assert.notNull(taskExplorer, "Expected TaskExplorer for " + target.getName());
            List<AggregateTaskExecution> taskExecutions = taskExplorer.getLatestTaskExecutionsByTaskNames(taskNames) // pass not one element but always all task names (list)
                    .stream()
                    .map(execution -> aggregateExecutionSupport.from(execution, target.getName(), platformName))
                    .collect(Collectors.toList());
            result.addAll(taskExecutions);
        }
        return result;
    }

Please, let me know if should be pass list or one element of list.

cppwfs commented 3 months ago

There was an optimization for this in the 2.11.3. Please kick the tires there and see if it helps.

Thanks!

szopal commented 3 months ago

Yes, after passing a single element, the time decreased from 30 seconds to 15 seconds for 100 task definitions and 1300 task executions however, I am concerned that if the number of records in the TASK_EXECUTION table is very large, on the order of several hundred thousand, the problem may still occur.