haskell / text-icu

This package provides the Haskell Data.Text.ICU library, for performing complex manipulation of Unicode text.
BSD 2-Clause "Simplified" License
47 stars 41 forks source link

New module Data.Text.ICU.Spoof #17

Closed bhamiltoncx closed 9 years ago

bhamiltoncx commented 9 years ago

This is my first attempt at integrating ICU's Unicode spoof-checking library with Data.Text.ICU.

This library checks for nasty Unicode lookalikes:

λ import Data.Text
λ import Data.Text.ICU
λ areConfusable spoof (pack "Hello") (pack "World")
CheckOK
λ let cyrillic = "\x0410\x0412\x0421\x0415"
λ :print cyrillic
cyrillic = "АВСЕ"
λ areConfusable spoof (pack "ABCE") (pack cyrillic)
CheckFailed [MixedScriptConfusable,WholeScriptConfusable]

More tests are better, but this is a nice start.

C library: http://icu-project.org/apiref/icu4c/uspoof_8h.html

bos commented 9 years ago

This generally looks good, thanks. Would you mind ensuring that all of the lines fit within 80 columns?

bhamiltoncx commented 9 years ago

This generally looks good, thanks. Would you mind ensuring that all of the lines fit within 80 columns?

Absolutely, will clean it up.

bos commented 9 years ago

All the little review comments aside, this is remarkably clean for a first attempt at Haskell in a couple of years. Nice work!

bhamiltoncx commented 9 years ago

OK. All the docs should be in ship-shape, and I think the BitMask class isn't nearly as ugly now. Let me know what you think!

bos commented 9 years ago

Luvverly jubbly!