microsoft / OCR-Form-Tools

A set of tools to use in Microsoft Azure Form Recognizer and OCR services.
MIT License
507 stars 170 forks source link

Redaction Toolkit | Support to redact some Latin ligature letters and letters with diacritics #1012

Closed cschenio closed 2 years ago

cschenio commented 2 years ago

Support to redact some Latin ligature letters and letters with diacritics

Purpose

We currently cannot handle accented letter like é and ligature letter like œ, which is common in French. Need to support this. We also expand the language coverage out of just French. Please check the code for the actual charset we have taken care of.

Validation

Before merging this PR, please make sure below works are done and marked items with 'x'.