kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
http://www.kitodo.org/software/kitodoproduction/
GNU General Public License v3.0
62 stars 63 forks source link

Script task as first task is not started #3672

Open matthias-ronge opened 4 years ago

matthias-ronge commented 4 years ago

Extracted from comment on issue #3279 by @mnscholz:

If the automatic script task is the first task of a process, it seems that the script doesn't get executed at all. The task displays as locked (strike-thru bell) forever.

BartChris commented 2 years ago

@matthias-ronge Hello,

While implementing a naive fix for enabling automatic first workflow tasks (https://github.com/kitodo/kitodo-production/pull/4853), i began to wonder how this has to be seen in the context of the different process creation modes (https://github.com/kitodo/kitodo-production/issues/4322). If all those modes are using the same template (workflow) the question arises how automatic starting workflow tasks should be handled. In theory this is a constellation where whenever a process creation event occurs (triggered by whatever procedure), it would have to be checked if this process has an automatic open task, which needs to be executed. And the event has to be triggered after everything related to process creation (creation of METS-file) is finished since the task might operate on the METS file.

Automatic first workflow steps are probably a prerequisite of fully automated workflows (https://github.com/kitodo/kitodo-production/issues/3333). On the other hand i do not know what the effect would be if a mass import would trigger a lot of CPU intensive tasks (e.g. generation of image derivatives) by automatically starting a task for every process. My naive fix above worked for a simple use case, but i suppose i am not sure how and where in the code base this could be best handled and this probably has to be adressed by someone with more knowledge of the specifics of process creations.

matthias-ronge commented 2 years ago

Our workflow engine is a very old legacy from 10 years of code history. And, we've never did large changes to it, just fixed immediate problems. Even a programmer who authored this a long time ago, should have said, "I'll never touch the workflow engine again". At the moment, the thread that closes the task and opens the next task does the basic work for the task (like creating or removing symbolic links), only a script is started asynchronously in a TaskScriptThread (you can sometimes see such in the Taskmanager). We have to think carefully, for example, if thousands of processes are automatically created for newspaper processes (this is done by a thread already), should each initialize a TaskScriptThread (they would stack up and possibly give out of memory errors) or participate in the same thread (that producing newspaper processes would then be done very, very slowly. What if the thread crashes or the server needs a reboot meanwhile?)

I think we would need a completely different design in the background to solve this in a clean way. I think there should be a separate task engine where each automatic task is actually represented with a thread (or a few, a fixed number though) that handles those tasks. Actually, I would even see this as a separate program, so that you can install multiple task engines on different servers and they work together. But for that, we would have to reprogram the whole task handling. Or maybe, in the future, make use of a powerful workflow engine and dispose of the self-made one.

Related: #4678

mnscholz commented 2 years ago

I see your point. Another situation where you would have multiple automatic task starting a the same point of time is when you manually step backward/forward a list of processes (Aktion Bearbeitungsstatus hoch/runtersetzen). Atm I haven't tested what the current behavior is, but I seem to remember in previous versions the automatic tasks wouldn't run...

As a (temporary) workaraound maybe it woud help if the workflow editor wouldn't allow such tasks as starting tasks.

BartChris commented 2 years ago

It seems clear that an improper usage of automatic tasks (it might even be multiple chained automatic tasks) might lead to program crashes. So the question is, wether the system admin should decide if it could be risked.

"Actually, I would even see this as a separate program, so that you can install multiple task engines on different servers and they work together. "

Such a mechanism could probably already be implemented by using ActiveMQ and having multiple workers which take over those tasks. But i do not now if even a the execution of hundreds of small scripts sending a message to active MQ might already be a problem.

This could probably not be combined with automatic "internal Kitodo Actions" like generating images since they have to be handled by Kitodo itself. So if somebody wants to trigger multiple first automatic tasks and also generate images in this context, it probaly also has to be done outside of Kitodo. So an alternative to not allowing automatic first tasks would be an explicit warning that this feature has to be used carefully.