kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
http://www.kitodo.org/software/kitodoproduction/
GNU General Public License v3.0
64 stars 63 forks source link

Retrieving template processes is inefficient for a large number of template processes #6267

Open thomaslow opened 1 month ago

thomaslow commented 1 month ago

When clicking the "+" button of a project in order to create or import a new process, the page loads forever (>30 seconds).

To Reproduce Steps to reproduce the behavior:

  1. Go to dashboard
  2. Click on + icon of project
  3. Get a coffee
  4. See the "Create new process page"

Expected behavior The "Create new process" page should appear without any delay.

Release Master

What is happening

I traced the problem to the "Process template" dialog that contains a list of all processes:

Screenshot from 2024-10-14 17-11-16

With my test database that contains ~80.000 process, this dialog tries to load all processes and its related entities, see:

https://github.com/kitodo/kitodo-production/blob/4d6138a8e84c4ca84b7438644c6d651eaf95c744/Kitodo/src/main/java/org/kitodo/production/forms/createprocess/SearchDialog.java#L65-L72

https://github.com/kitodo/kitodo-production/blob/4d6138a8e84c4ca84b7438644c6d651eaf95c744/Kitodo/src/main/java/org/kitodo/production/services/data/ProcessService.java#L2845-L2855

There are multiple problems:

By limiting the ElasticSearch query to return at most 100 processes, the "Create new process" page loads fine without any noticeable delay.

Suggested Changes

My suggestion would be to replace this drop down list with a simple search dialog or by asking the user to provide the process id without listing all processes.

henning-gerhardt commented 1 month ago

Did you use an catalogue or not? If I try to create a new process I got first the catalogue dialog and later I can use the process templates to create a new process based on the selected process template.

So far as I know there should only processes are shown which are used as a process template. This process template processes are serial / periodical issues which should not so many times existing in a Kitodo.Production instance (@andre-hohmann correct me if I'm wrong). But your catch is still true but the solution should be discussed.

thomaslow commented 1 month ago

@henning-gerhardt I have only set up one "process template" for each project. If I set up two process templates, there is a selection dialog before the "Create new process" page is loaded. However, the selection has no impact on performance.

I guess the problem is that - a few years ago - I have a auto-generated 1000s of processes with the property "inChoiceListShown" set to true, such that this particular dialog now contains a huge list.

If this is not a problem in live production systems, I will manually edit my test database to fix the problem for me. Thank you for your feedback!

henning-gerhardt commented 1 month ago

You got trapped. There are two "process templates": one as you well known process template of the projects and the other over the "inChoiceListShown" which is displayed in the process create dialog as "process template" chooser - even in your own screenshot below the selected / shown ruleset choice. The shown list of processes in your screenshot are the processes which have the "inChoiceListShown" property set to true in the database / index and they are called "process templates" too but the based on already existing processes and not on the "process template" of a project. There is a better naming at least in English (or maybe even in German) needed here to show the differences.

thomaslow commented 1 month ago

Okay. I'll keep the issue open for a few days, in case somebody else had a similar problem or thinks we need to implement some changes. If nobody else comments, I will close the issue in a few days.

solth commented 1 month ago

I think @henning-gerhardt is mostly correct, just one small remark: to avoid confusion - even though it's obviously only partly successful, based on Henning's description! - the "Templates", which have their own database table, are called "Templates" or "Process templates", while the processes with the specific flag inChoiceListShown set to true are referred to as "Template processes", so the other way round:

Template process :left_right_arrow: Process template

Nonetheless I support @thomaslow's suggestion to refactor the retrieval of "Template processes". Even if having so many "template processes" in the system is unlikely, if the current query can be improved, that is exactly what we should do.