knrafto / language-bash

Parse and pretty-print Bash shell scripts
BSD 3-Clause "New" or "Revised" License
35 stars 9 forks source link

Brace expansion is not correct #7

Closed drvink closed 9 years ago

drvink commented 9 years ago
bash$ echo {{a,b}
{a {b
ghci> map (\c -> case c of Char x -> x) (head (braceExpand (map Char "{{a,b}")))
"{{a,b}"

Since the bash expansion algorithm knows about quoting and escaping, this would need to be made aware of those too. (It's really ugly...)

bash$ echo {q\{b{zp,\}c}{kmy\}}{tjl}
{q{bzp{kmy}}{tjl} {q{b}c{kmy}}{tjl}
knrafto commented 9 years ago

Thanks, I'm going to need to write a test suite soon. I think expanding with quoting works, because braceExpand looks for unquoted braces (Char '{' and Char '}'). To get a Word, it's easiest to use Text.Parsec.parse Language.Bash.Parse.Word.word. Is this common enough to have a parseWord function either in Language.Bash.Parse or Language.Bash.Parse.Word?

knrafto commented 9 years ago

Should be fixed now.

> let testExpand s = map Language.Bash.Pretty.prettyText $ Language.Bash.Expand.braceExpand (case Text.Parsec.parse Language.Bash.Parse.Word.word "" s of Right w -> w)
> testExpand "{a,b}"
["a","b"]
> testExpand "{{a,b}"
["{a","{b"]
> testExpand "{a\\{,b\\,c}"
["a\\{","b\\,c"]
drvink commented 9 years ago

While that one's now fixed, it looks like there's more cases that don't work. I spent an embarrassingly long time trying to implement a parser for this myself--the grammar is so hideous that the easiest way to get a working clone might very well be to directly translate bash's imperative mess.

I barely speak Haskell, but here's some QuickCheck code to exercise the brace expander. (I originally wrote this in F# with FsCheck and then had to figure out how to translate it.)

import System.Process (readProcess)

import Text.Regex (subRegex)
import Text.Regex.TDFA (makeRegex, match)
import Test.QuickCheck
import Test.QuickCheck.Monadic (assert, monadicIO, run)
import Language.Bash.Expand (braceExpand)
import Language.Bash.Parse.Word (word)
import Language.Bash.Pretty (prettyText)
import Text.Parsec (parse)

testExpand :: String -> [String]
testExpand s =
  map prettyText $ braceExpand (case parse word "" s of Right w -> w)

charset :: Gen String
charset = elements strs
  where
    strs = ["", ",", "\\{", "\\}", "\\,", "\\ "] ++
           map (:[]) ['a'..'z']

junk :: Int -> Gen String
junk sz = do
    xs <- resize sz $ listOf1 charset
    return $ foldr (++) "" xs

maybeBraced :: Gen String
maybeBraced = do
    s <- junk 10
    frequency [(5, braced s),
               (1, llbraced s),
               (1, lrbraced s),
               (1, rlbraced s),
               (1, rrbraced s),
               (1, str_only s)]
  where
    braced s   = return $ "{" ++ s ++ "}"
    llbraced s = return $ "{" ++ s
    lrbraced s = return $ "}" ++ s
    rlbraced s = return $        s ++ "{"
    rrbraced s = return $        s ++ "}"
    str_only   = return

maybeStr :: Gen String
maybeStr = oneof [blank, maybeBraced]
  where
    blank = return ""

braceExpr :: Gen String
braceExpr = sized braceExpr'
  where
    braceExpr' 0 = maybeStr
    braceExpr' n
      | n > 0 = do
          x <- braceExpr' $ n `quot` 2
          y <- braceExpr' $ n `quot` 2
          return $ x ++ y
      | otherwise = error "wao"

expandWithBash :: String -> IO String
expandWithBash str = do
    expn <- readProcess "/bin/bash" ["-c", "echo " ++ str] ""
    return $ filter (\x -> x /= '\r' && x /= '\n') expn

prop_expandsLikeBash :: String -> Property
prop_expandsLikeBash str = monadicIO test
  where
    re = makeRegex "\\\\(.)"
    test = do
        bash <- run $ expandWithBash str
        let expn = testExpand str
            joined = unwords expn
            m = match re joined
            check = if m then subRegex re joined "\\1" else joined
        run $ putStrLn bash
        run $ putStrLn check
        assert $ bash == check

main :: IO ()
main = quickCheck $ forAll braceExpr prop_expandsLikeBash
knrafto commented 9 years ago

I'm still working, and I've gotten pretty close. I've switched from using Parsec to a Word -> [(a, Word)] type parser, and the problem is getting the parses in the right order to the one we want comes first.

drvink commented 9 years ago

For reference, bash's algorithm is here, though reading it may result more in confusion than in enlightenment.

knrafto commented 9 years ago
/* We ignore an open brace surrounded by whitespace, and also
    an open brace followed immediately by a close brace preceded
    by whitespace.  */

This is exactly what I needed. Thanks!