pkiraly / qa-catalogue

QA catalogue – a metadata quality assessment tool for library catalogue records (MARC, PICA)
GNU General Public License v3.0
76 stars 18 forks source link

Parallel processing #278

Open nichtich opened 1 year ago

nichtich commented 1 year ago

Some analysis tasks can be run in parallel. This can be done in do_all_analyses of common-script but requires knowledge about which taks depend on each other. Parallel execution is already possible by starting multiple instances of the analysis with different taks (e.g. one process validate,validate_sqlite and one process completeness,completeness_sqlite) but dependencies among tasks are not checked.

Most individual analysis task can also be speed up by parallel programming, this depends on the type of task. If the task involves parsing the whole input records, one thread should do the parsing and distribute records subsets to other threads.

pkiraly commented 1 year ago

There are two problems to solve:

All in all I think it is not a simple ticket, but it should be a "milestone" or "epoch", and should have several children tickets.