pkiraly / qa-catalogue

QA catalogue – a metadata quality assessment tool for library catalogue records (MARC, PICA)
GNU General Public License v3.0
76 stars 18 forks source link

Analysis parameter files should be written at the end of analysis #343

Closed nichtich closed 8 months ago

nichtich commented 8 months ago

It looks like analysis parameter files such as completeness.params.json are written at the start of analysis (via processor.beforeIteration) . This results in start time shown via analysisTimestamp in the client when the end time of analysis should be shown instead.

pkiraly commented 8 months ago

It is testable. It is enough to run on a smaller dataset (e.g. 100K records). It takes minutes, not 2 days.

nichtich commented 8 months ago

The visible timestamp in the client helps to tell whether/when the analysis has been finished. Current setup with 73 million PICA records took 2 to 2:30 hours for each of completeness, classifications, and authorities.

pkiraly commented 8 months ago

Sorry, I do not understand this comment. The client = the web UI? In the web UI there is a visible timestamp. What is missing is an explicit "status" information, such as started at <date time> or finished at <date time>. We can also add the duration of the process, such as finished at <date time> took <duration>. When I mentioned that it takes minutes (if you run the tool on 100K records) I mean everything including index. Completeness etc. take some hours, but the indexing takes much longer.

nichtich commented 8 months ago

Sorry for confusion. This has been done with https://github.com/pkiraly/qa-catalogue/commit/6812aed4382d6ea87b5fc9abb6ce656328f0c296