PDFUA-208: OCR document with corrected transparent text based on ActualText - Githubissues

zakkinsey / pdfa-test-7

0 stars 0 forks source link

PDFUA-208: OCR document with corrected transparent text based on ActualText #208

Open zakkinsey opened 3 years ago

zakkinsey commented 3 years ago

Jira issue originally created by user @calanca2:

The objective of this technique is to show how to handle scanned text with OCR. Having transparent text is the prerequisite for tagging the text content and making it machine-readable. In this case the spelling mistakes in the transparent text are corrected by adding ActualText on the Span marked-content sequence (container).

zakkinsey commented 2 years ago

@MarkusErle assigned issue to self

zakkinsey commented 2 years ago

@MarkusErle added a link to #209: This issue relates to #209

zakkinsey commented 2 years ago

@MarkusErle added a link to #149: This issue relates to #149

zakkinsey commented 2 years ago

@MarkusErle added a link to #228: This issue relates to #228

zakkinsey commented 2 years ago

Comment created by @MarkusErle:

Adding ActualText to Marked Content: see PDF 1.7, Table 338, Note 2

zakkinsey commented 2 years ago

Comment created by @PaulRayius:

Metadata committee - we feel this fails WCAG 1.4.5 because images of text are not allowed, even though this passes by PDF/UA requirements. We think something about this should be added to the Description.

zakkinsey commented 2 weeks ago

@zakkinsey unassigned issue from @MarkusErle as part of jira->github migration