sjshuck / hs-pcre2

Complete Haskell binding to PCRE2
Apache License 2.0
12 stars 2 forks source link

pcre2

CI Hackage

Regular expressions for Haskell.

Teasers

licensePlate :: Text -> Maybe Text
licensePlate = match "[A-Z]{3}[0-9]{3,4}"

licensePlates :: Text -> [Text]
licensePlates = match "[A-Z]{3}[0-9]{3,4}"
case "The quick brown fox" of
    [regex|\bbrown\s+(?<animal>[A-z]+)\b|] -> Text.putStrLn animal
    _                                      -> error "nothing brown"
let kv'd = lined . packed . [_regex|(?x)  # Extended PCRE2 syntax
        ^\s*          # Ignore leading whitespace
        ([^=:\s].*?)  # Capture the non-empty key
        \s*           # Ignore trailing whitespace
        [=:]          # Separator
        \s*           # Ignore leading whitespace
        (.*?)         # Capture the possibly-empty value
        \s*$          # Ignore trailing whitespace
    |]

forMOf kv'd file $ execStateT $ do
    k <- gets $ capture @1
    v <- gets $ capture @2
    liftIO $ Text.putStrLn $ "found " <> k <> " set to " <> v

    case myMap ^. at k of
        Just v' | v /= v' -> do
            liftIO $ Text.putStrLn $ "setting " <> k <> " to " <> v'
            _capture @2 .= v'
        _ -> liftIO $ Text.putStrLn "no change"

Features

Performance

Currently we are slower than other libraries. For example:

Operation pcre2 pcre-light regex-pcre-builtin
Compile and match a regex 3.9 μs 1.2 μs 2.9 μs

If it's really regex processing that's causing a bottleneck, pcre-light/-heavy/lens-regex-pcre are recommended instead of this library for the very best performance.

Unicode

Encoding text version pcre2 version Code unit representation
UTF-8 ≥ 2 ≥ 2.2 Foreign.C.Types.CUChar
UTF-16 < 2 < 2.2 Foreign.C.Types.CUShort

Wishlist

License

Apache 2.0.
PCRE2 is distributed under the 3-clause BSD license.

Main Author

©2020–2022 Shlomo Shuck