kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
http://www.kitodo.org/software/kitodoproduction/
GNU General Public License v3.0
62 stars 63 forks source link

Automatic export is triggered multiple times per process #6140

Open BartChris opened 1 month ago

BartChris commented 1 month ago

Describe the bug When a lot of automatic exports are triggered for multiple processes (e.g. after multiple automatic processing steps), it seems that the export for the processes is sometimes triggered two times and the export of the same process appears two times in the task manager.

image

I tried to trace the behaviour and it seems as if the automatic export is triggered two times, when the previous task is closed and the next task is started and executed automatically. Maybe i am reading the code wrong, but the current implementation seems wrong.

When a task is closed activateTasksForClosedTask is called, which will activate the following tasks.

https://github.com/kitodo/kitodo-production/blob/ee97b6883e850d760947a26dc73c4a26806bb63e/Kitodo/src/main/java/org/kitodo/production/services/workflow/WorkflowControllerService.java#L437

The code inside activateTask is then calling processAutomaticTask

https://github.com/kitodo/kitodo-production/blob/ee97b6883e850d760947a26dc73c4a26806bb63e/Kitodo/src/main/java/org/kitodo/production/services/workflow/WorkflowControllerService.java#L614

which will add all automatic tasks to the list automaticTasks

https://github.com/kitodo/kitodo-production/blob/ee97b6883e850d760947a26dc73c4a26806bb63e/Kitodo/src/main/java/org/kitodo/production/services/workflow/WorkflowControllerService.java#L678

For every task on the list Kitodo will start a TaskScriptThread

https://github.com/kitodo/kitodo-production/blob/ee97b6883e850d760947a26dc73c4a26806bb63e/Kitodo/src/main/java/org/kitodo/production/services/workflow/WorkflowControllerService.java#L464-L467

When this TaskScriptThread is executed it will get evaluated for the type of task, in the case of the export this will lead to the execution of executeDmsExport (taskService.executeDmsExport(this.task);)

https://github.com/kitodo/kitodo-production/blob/ee97b6883e850d760947a26dc73c4a26806bb63e/Kitodo/src/main/java/org/kitodo/production/thread/TaskScriptThread.java#L76

This is were the problems might begin, because that method will start another TaskScriptThread

https://github.com/kitodo/kitodo-production/blob/ee97b6883e850d760947a26dc73c4a26806bb63e/Kitodo/src/main/java/org/kitodo/export/ExportDms.java#L133-L138

In my understanding we now have two threads which will end up doing the exact same thing: Exporting the process.

This is different from generateImages which seems to get executed in the already existing thread. https://github.com/kitodo/kitodo-production/blob/ee97b6883e850d760947a26dc73c4a26806bb63e/Kitodo/src/main/java/org/kitodo/production/services/data/TaskService.java#L598

We probably should handle the export similar to the generateImages and not start a Thread, when there is already a thread for the export.

Expected behavior The export should only be triggered one time per process.

Release 3.6, Master

henning-gerhardt commented 1 month ago

I think this behavior is correct but misleading :-( The first shown task in the task manager is the task from the workflow. The second task in the task manager is the executed action from the workflow task. As both have using the same words it is confusing.

edit: if you have a process with a lot of media data (50gb or more, depending on how fast a file copy is): you will see that on execution of the first task no export is done / no data get copied. Only on execution of the second task the data get exported / data get copied. Look even on the time stamp information on the exported data.

apiller commented 1 month ago

But should the task be executed several times? As I understand the first one is the automatic script and the second one the execution of the script. In this case the script is executed more than once. Export

henning-gerhardt commented 1 month ago

Did you execute tasks in taskmanager parallel (config taskManager.autoRunLimit with a value greater than 1)?

An other reason could be on exporting year issues of newspapers with all their daily issues at once. We discovered in this case a crazy, time and resource consuming behavior until Java.out.OfMemoryException.

apiller commented 1 month ago

Normal export by selecting issues an choosing export in the actions on the left is ok and working. Just finishing a whole year at once or even a month or less with "set status of a process up" is causing this. It even starts to export allready finished processes.

henning-gerhardt commented 1 month ago

Just finishing a whole year at once or even a month or less with "set status of a process up" is causing this. It even starts to export allready finished processes.

We have the same discovering but without "set status of a process up" (as we did not use this on our workflow as it could break more things than it solve things after our experience). I think that this usage scenario is really not full covered by the current code.

I digged into this and the code in the startExport() method (https://github.com/kitodo/kitodo-production/blob/master/Kitodo/src/main/java/org/kitodo/export/ExportDms.java#L102-L122) plays a important role at this export behavior of exporting year level newspapers from the ui. It is more complicated as already exported and not exported issues of this year influence this.

Finding a better and more robust way to export all kinds of processes and including of all kind of hierarchy levels is a complicated thing.

BartChris commented 1 month ago

I think this behavior is correct but misleading :-( The first shown task in the task manager is the task from the workflow. The second task in the task manager is the executed action from the workflow task. As both have using the same words it is confusing.

edit: if you have a process with a lot of media data (50gb or more, depending on how fast a file copy is): you will see that on execution of the first task no export is done / no data get copied. Only on execution of the second task the data get exported / data get copied. Look even on the time stamp information on the exported data.

I think you are right. There are two threads started, but the first probably only acts as a wrapper thread which triggers the actual exporting thread. I suppose we do not need the wrapper thread and can directly trigger the export thread (https://github.com/kitodo/kitodo-production/pull/6141) but i suppose this will not adress the large problems with the export as discussed here.

henning-gerhardt commented 1 month ago

One more note: the second thread is only created and started if the automatic export configuration is set to true. If this is not the case the export will immediately started. So this configuration option has influence of exporting.

BartChris commented 1 month ago

No hard evidence yet, but setting "ASYNCHRONOUS_AUTOMATIC_EXPORT" to false seem to bring some improvements on systems which try to combine multiple "set status of a process up" with autoRunLimit greater than 1. In that scenario the export operation still runs in it's own thread (the thread which was only a "wrapper" thread before but now does the actual work). The problem is that when you do that there is no mechanism to close the export task so it stays open. https://github.com/kitodo/kitodo-production/blob/ee97b6883e850d760947a26dc73c4a26806bb63e/Kitodo/src/main/java/org/kitodo/export/ExportDms.java#L140

So maybe it is worth investigating whether the instantiation of a thread, which does not much more than triggering another thread, might lead to problems.

henning-gerhardt commented 1 month ago

Using "set status of a process" up is may the cause. It is working different on different configured tasks.

This different behavior of this action decided us to not use this action as a regular user action and even using this as an administrative action is dangerous act as you must know which task will work in which way on using this action and in some situations (exporting is such an action) we did not use it.