purebred-mua / tasty-tmux

Terminal user acceptance testing via tmux
GNU Affero General Public License v3.0
6 stars 0 forks source link

regexes fail when source contains high unicode chars #8

Closed romanofski closed 5 years ago

romanofski commented 5 years ago

Describe the bug When matching against a dialog which is rendered as a stacked widget in the foreground I'm unable to match as a regex. I have not investigated further than the obvious wiggling with the regular expressions, so it might as well be a PEBKAC.

To Reproduce Steps to reproduce the behavior:

  1. Assert against a dialog button in the foreground using a Regex condition
  2. See it not match (see example below)

Expected behavior Regex matches control sequences

Additional context This test case is from https://github.com/purebred-mua/purebred/pull/295

            startApplication
            step "start composition"
            sendKeys "m" (Substring "From")

            step "enter from email"
            sendKeys "Enter" (Substring "To")

            step "enter to: email"
            sendKeys "user@to.test\r" (Substring "Subject")

            step "enter subject"
            sendKeys "test subject\r" (Substring "~")

            step "enter mail body"
            sendKeys "iThis is a test body" (Substring "body")

            step "exit insert mode in vim"
            sendKeys "Escape" (Substring "body")

            step "exit vim"
            sendKeys ": x\r" (Regex $ "From: "
                             <> "\"Joe Bloggs\" <joe@foo.test>")

            step "abort mail"
           -- does not match
            sendKeys "q" (Regex (buildAnsiRegex [] ["30"] ["42"] <> "\\s+Keep"))

and the captured output looks like:

 raw: "          From: \"Joe Bloggs\" <joe@foo.test>                                     \n            To: user@to.test                                                    \n       Subject: test subject                                                    \n                                                                                \n\ESC[30m\ESC[43m -- Attachments-----------------------------------------------------------------\n\ESC[37m I --                                                                 text/plain\n\ESC[34m\ESC[40m                                                                                \n                                                                                \n                                                                                \n                                                                                \n               \ESC[33m\ESC[47m\9484\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472Keep draft?\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9488\ESC[34m\ESC[40m               \n               \ESC[33m\ESC[47m\9474             \ESC[30m\ESC[42m  Keep  \ESC[33m\ESC[47m   \ESC[30m  Discard  \ESC[33m             \9474\ESC[34m\ESC[40m               \n               \ESC[33m\ESC[47m\9492\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9472\9496\ESC[34m\ESC[40m               \n                                                                                \n                                                                                \n                                                                                \n                                                                                \n                                                                                \n                                                                                \n                                                                                \n                                                                                \n                                                                                \n                                                                                \n\ESC[30m\ESC[43mPurebred:                                                                       \n"

The rendered terminal output looks like:


                 From: "Joe Bloggs" <joe@foo.test>                                     
                  To: user@to.test                                                    
             Subject: test subject                                                    

       -- Attachments-----------------------------------------------------------------
       I --                                                                 text/plain

                     ┌───────────────────Keep draft?──────────────────┐               
                     │               Keep       Discard               │               
                     └────────────────────────────────────────────────┘               

      Purebred:                                                                       

The Keep button is highlighted in the captured output.

frasertweedale commented 5 years ago

Seems to be related to the "\\s+Keep" part. This is not a POSIX regex. " +Keep" or "[[:space:]]+Keep instead.

I find this extremely weird because we were using \\s all over the place without any issue.

frasertweedale commented 5 years ago

Wait, seems to be platform related, i.e. how does system regex(3) library behave? On BSD Text.Regex.Posix does not handle \\s, but it does on GNU+Linux (f30).

frasertweedale commented 5 years ago

Oh yeah, the amazing thing is, if you copy-paste a bit of the capture, say:

s = "             \ESC[30m\ESC[42m  Keep  \ESC[33m\ESC[47m   \ESC[30m  Discard  \ESC[33m             "

Then s =~ re :: Bool = True. But matching against the full pattern fails. Smells like a bug, or a surprising behaviour, perhaps in the glibc posix implementation.

frasertweedale commented 5 years ago

Seems to be related to high unicode chars (note the box drawing codepoints). BSD and Linux behave the same in this regard.

frasertweedale commented 5 years ago

OK, fun fact, works fine when the source string is a ByteString. Seems to be related to use of Foreign.C.String.withCAstring function in the RegexLike instance in Text.Regex.Posix.C (see https://hackage.haskell.org/package/regex-posix-0.95.2/docs/src/Text-Regex-Posix-String.html#line-76). So I think we should change both the Capture and the Regex pattern type to be ByteString, and that should make this problem go away