jan-christiansen / nextstep-plist

Parser and printer for NextStep style plist files
Other
3 stars 0 forks source link

`pQuotedString` misinterprets backslashes due to `reads` trying to parse Haskell escape characters #2

Open bwbaugh opened 6 years ago

bwbaugh commented 6 years ago

The issue is that pQuotedString uses reads, which tries to interpret a quoted string as a Haskell style string literal, instead of using a Parsec parser.

https://github.com/jan-christiansen/nextstep-plist/blob/48ec35d583d1c5857ed22fc182be25c2db3743ae/Text/NSPlist/Parsec.hs#L70-L75

For example, trying to parse a backslash in a quoted string:

> putStrLn "\"\\\""
"\"

fails:

> (reads :: ReadS String) "\"\\\""
[]

This is because it’s trying to interpret Haskell specific escape codes, escape characters, and numeric escapes:

> (reads :: ReadS String) "\"\\123\""
[("{","")]

See §2.6 for more info. https://www.haskell.org/onlinereport/lexemes.html

One solution might be to forgo any unescaping and just return the raw string. Another solution might be to make the parsing and printing understand NeXTStep style escaping.

Background

I‘m looking into building a tool (https://github.com/gnarf/osx-compose-key/issues/17) that parses DefaultKeyBinding.dict files, which uses backslashes/escaping heavily, as a learning exercise.

bwbaugh commented 6 years ago

I’ve come up with a very hacky workaround just to unblock myself to see if I can use this library on that file that I’m interested in. A proper solution would probably use a parser, perhaps similar to https://stackoverflow.com/q/24106314/1988505.

Patch file Apply with: ```bash $ patch -p1 < «patchfile» ``` ```diff --- a/Text/NSPlist/Parsec.hs 2012-09-30 11:15:43.000000000 -0400 +++ b/Text/NSPlist/Parsec.hs 2017-10-22 21:52:31.000000000 -0400 @@ -3,6 +3,8 @@ ) where +import Data.List (intercalate) +import Data.List.Split (splitOn) import Data.Word (Word8) import Numeric (readHex) import Control.Applicative ((<$>), (<*), (*>), (<*>), pure, (<|>), empty) @@ -69,10 +71,18 @@ pQuotedString :: Parsec String u String pQuotedString = do - input <- getInput + input <- hackyReplace <$> getInput case reads input of - ((str, rest):_) -> const str <$> setInput rest + ((str, rest):_) -> const (hackyUnreplace str) <$> setInput (hackyUnreplace rest) _ -> empty + where + replacements = + [ ("\\", "__backslash__") + , ("\\\"", "__escaped_double_quote__") + ] + hackyReplace = flip (foldr (uncurry replace)) replacements + hackyUnreplace = flip (foldr (uncurry (flip replace))) replacements + replace old new = intercalate new . splitOn old -- | Parses data that is represented by hexadecimal codes pBinary :: Parsec String u NSPlistValue --- a/nextstep-plist.cabal 2017-10-22 21:53:18.000000000 -0400 +++ b/nextstep-plist.cabal 2017-10-22 21:56:05.000000000 -0400 @@ -21,7 +21,7 @@ library - build-depends: base >= 4 && < 5, QuickCheck >= 2, pretty, parsec >= 3 + build-depends: base >= 4 && < 5, QuickCheck >= 2, pretty, parsec >= 3, split ghc-options: -Wall exposed-modules: Text.NSPlist, Text.NSPlist.Pretty, ```