welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

OCR Quality control : parliamentary records sample 6 of 6 #459

Closed BobBorges closed 4 months ago

BobBorges commented 5 months ago

description

We need to assess quality of OCR. In order to do that we need to compare manually transposed lines from parliamentary records to the OCR output.

the task

In the attached CSV file, you will find links to randomized pages from parliamentary records that have been scanned an OCRed (under the "facs" column). Open that image in a web browser (Use your betalab credentials). Under the column "row_to_check", you will find the line number of a sampled line -- your job is to fill in the text exactly as it appears in the image in the "content" row. Take care to add the text with precision -- include any punctuation, diacritics, etc.

randomized_sample_5.csv

viremn commented 4 months ago

I'll take this task now. It'll be done before today.

viremn commented 4 months ago

randomized_sample_5_annotated.csv

Here's the annotated document.