Open eugene-yang opened 2 years ago
Do you mean normalize diacritics or remove them? The current code normalizes them: https://github.com/hltcoe/patapsco/blob/master/patapsco/util/normalize.py#L240
I think we want to remove diacritics.
On Thu, Jun 23, 2022 at 11:45 AM Cash Costello @.***> wrote:
Do you mean normalize diacritics or remove them? The current code normalizes them: https://github.com/hltcoe/patapsco/blob/master/patapsco/util/normalize.py#L240
— Reply to this email directly, view it on GitHub https://github.com/hltcoe/patapsco/issues/46#issuecomment-1164641426, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJNDOVRQSK25MZVE2W2F3DVQSIDJANCNFSM5ZU2D5TA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
--
Dawn J. Lawrie Ph.D. Senior Research Scientist Human Language Technology Center of Excellence Johns Hopkins University 810 Wyman Park Drive Baltimore, MD 21211 @.*** https://hltcoe.jhu.edu/faculty/dawn-lawrie/
Characters with accented characters should be normalized for better matching. Or at least should be an option for user to select.
Here is an example from CLEF.