Closed bertsky closed 2 years ago
Also, note that this does not cope with re-entering the same job again (which would also make it re-use the Controller-side REMOTE_DIR
), only the same process. (We currently have no entry point for rerunning jobs from Kitodo.)
@bertsky Should we create a make target to delete files of WORKDIR of Manager or Controller? Or should we add a parameter to force deleting WORKDIRS before processing?
Also, note that this does not cope with re-entering the same job again (which would also make it re-use the Controller-side
REMOTE_DIR
), only the same process.
But thinking about it, it may at least help to name REMOTE_DIR
without the variable $PID
, only with $PROCESS_ID
and $TASK_ID
, which are constant and therefore should allow for rerunning the job (regardless of how this might be triggered). What do you think?
Should we create a make target to delete files of WORKDIR of Manager or Controller? Or should we add a parameter to force deleting WORKDIRS before processing?
No, I wouldn't do that via makefile. There's a to-do in the comments to cron-schedule the removal on the Controller, which should suffice. And for the Manager, the same mechanism that removes finished process data should also be responsible for the ocr-d/
side, so that's outside of the Manager's scope.
Also, note that this does not cope with re-entering the same job again (which would also make it re-use the Controller-side
REMOTE_DIR
), only the same process.But thinking about it, it may at least help to name
REMOTE_DIR
without the variable$PID
, only with$PROCESS_ID
and$TASK_ID
, which are constant and therefore should allow for rerunning the job (regardless of how this might be triggered). What do you think?
I think the $TASK_ID
is not suitable to put this value in the scope of the Controller, cause it is a Kitodo.Production specific and Controller should be independent from application who triggers ocr process. I think it should be the $PROCESS_ID
with prefix of application e.g. "Production", "Presentation" hand over by the Manager or something else to distinguish between scripts e.g. for_production, for_presentation ...
I think the
$TASK_ID
is not suitable to put this value in the scope of the Controller, cause it is a Kitodo.Production specific and Controller should be independent from application who triggers ocr process. I think it should be the$PROCESS_ID
with prefix of application e.g. "Production", "Presentation" hand over by the Manager or something else to distinguish between scripts e.g. for_production, for_presentation ...
It's not in the "scope" of the Controller, though. It's the Manager's choice. The Controller just gets a path name (ideally not conflicting with anything else). And I think that having TASK_ID
in there is actually correct: Suppose you have a workflow with two places for OCR(-D): once for the page layout and text on the images, then some more steps in Production including export, and then again for document layout on the presentation METS. The second time must not clash with the first time, i.e. it should have different directories on the Controller.
Ok that is a good point. I have to think about this a bit more because tasks can be deleted etc. but for the moment it sounds like the best way.
Fixes https://github.com/markusweigelt/kitodo_production_ocrd/issues/17
Note that due to https://github.com/OCR-D/core/issues/825 this still does not work completely, but there's nothing we can do about that on our side.