slub / ocrd_manager

frontend for ocrd_controller and adapter towards ocrd_kitodo
MIT License
11 stars 3 forks source link

Benchmarking workflows #65

Open markusweigelt opened 11 months ago

markusweigelt commented 11 months ago
bertsky commented 11 months ago

There are two types of data here:

The former depends on the latter, for which we rely on the Controller's (OCR-D) internal mechanisms to collect the primary data. By default (currently) the ocrd.log file in the workspace will contain runtime data (CPU time and peak memory), but one would still need to aggregate from the individual processing steps. Alternatively, we could install a custom ocrd_logging.conf in the Controller where we send the profiling messages to an external syslogd (on the Manager). Regardless, we must then parse the log messages that the ocrd.process.profile logger generates into our database.

markusweigelt commented 11 months ago

The motivation for this issue originated from an OCR-D call, where it was mentioned "we welcome any benchmark data" presumably to optimize the quality of the processors. So i assume, there should already be a way to evaluate data and make these comparable. We could provide these data as well, or is this simply the wrong approach and does the controller already come with monitoring?

Some ideas: