qurator-spk / dinglehopper

An OCR evaluation tool
Apache License 2.0
58 stars 12 forks source link

Generate per-workspace CER + WER #35

Open mikegerber opened 3 years ago

mikegerber commented 3 years ago
  1. It's easy to calculate this from the individual CER/WER and the character/word counts.
  2. But how to save a global JSON report in the METS? It would not "manifest a physical page" which OCR-D seems to demand for any file
mikegerber commented 3 years ago

But how to save a global JSON report in the METS? It would not "manifest a physical page" which OCR-D seems to demand for any file

@kba @bertsky @cneud Any thoughts on this?

bertsky commented 3 years ago

But how to save a global JSON report in the METS? It would not "manifest a physical page" which OCR-D seems to demand for any file

@kba @bertsky @cneud Any thoughts on this?

Yes, this became possible when we agreed on an "official" way to have global (document-wide) files in the METS. You just put it in there without a pageId (i.e. without a reference of the file in the structMap) and use a certain convention for the file ID (with FULLDOWNLOAD IIRC).

Now that different MIME types are allowed in fileGrps, having an output fileGrp with page-wise and document-global reports should be no problem.

bertsky commented 3 years ago

BTW I believe having a measurement of CER standard deviation or variance is also useful. See here for an implementation.

mikegerber commented 3 years ago

(Closing the issue was an accident, I often hit the wrong buttons)