simonmichael / shelltestrunner

Easy, repeatable testing of CLI programs/commands
GNU General Public License v3.0
129 stars 10 forks source link

multiline matching of stdout/stderr doesn't work as expected (cannot achieve it) #29

Open ppenguin opened 3 years ago

ppenguin commented 3 years ago

When testing this, it matches:

# test multiline matches
$ echo -e "Line 1 blabla\nLine 2 haha\nLine 3 hihihi"
> /.*Line 1.*/
>= 0

but when testing this, it doesn't (returns failure):

# test multiline matches
$ echo -e "Line 1 blabla\nLine 2 haha\nLine 3 hihihi"
> /.*Line 1.*Line 2.*/
>= 0

the regex.TDFA matcher is supposed to default to multiline, so I'd expect that one to work. But even if I explicitly try to include newlines for the catch-all, it doesn't work as well (also returns failure):

# test multiline matches
$ echo -e "Line 1 blabla\nLine 2 haha\nLine 3 hihihi"
> /.*Line 1(.|\n)*Line 2.*/
>= 0

Is there a syntax I can use to achieve multiline matching as-is, or does it require a mod to the code?

simonmichael commented 3 years ago

Sorry I'm not sure - needs debugging. Perhaps we are not calling it in multiline mode.

obfusk commented 3 years ago

It is called in multiline mode. But regex-tdfa has a non-standard multiline mode that combines what is usually known as "multiline" with inverse "dotall" and also disables matching newlines in inverted character classes (so you can't even use e.g. [^!]).

You can match a newline using "(.|\n)", but only with an actual newline in the pattern (since \n is just n to regex-tdfa). I don't think that shelltestrunner can currenly do that for you.

obfusk commented 3 years ago

Also: fyi echo -e doesn't usually work with /bin/sh.

obfusk commented 3 years ago

regex-tdfa does recognise [[:space:]] though, so this works:

# test multiline matches
$ printf "Line 1 blabla\nLine 2 haha\nLine 3 hihihi\n"
> /.*Line 1(.|[[:space:]])*Line 2.*/
>= 0
obfusk commented 3 years ago

Also: fyi echo -e doesn't usually work with /bin/sh.

# with bash
$ echo "foo\nbar" 
foo\nbar
$ echo -e "foo\nbar" 
foo
bar
$ printf "foo\nbar\n"
foo
bar
# with /bin/sh (on my system)
$ echo "foo\nbar" 
foo
bar
$ echo -e "foo\nbar"
-e foo
bar
$ printf "foo\nbar\n"
foo
bar

The behaviour of echo regarding escapes and options differs greatly between systems. I recommend using printf instead (though you need to manually add a \n at the end).

ppenguin commented 3 years ago

@obfusk Thanks for the infos, very useful. Indeed on my sh echo -e works as expected, but since in some cases I need compatibility with e.g. busybox etc, it's still a valuable comment which I will use.

As for the [[:space:]] workaround, very useful! I guess this greatly lessens at least the urgency of this issue.

What I didn't 100% understand is whether we should abandon the expectation to handle newlines in a standard way completely due to inherent limitations of regex-tdfa, or whether it would be possible to configure it in such a way that the behaviour is possible?

obfusk commented 3 years ago

What I didn't 100% understand is whether we should abandon the expectation to handle newlines in a standard way completely due to inherent limitations of regex-tdfa, or whether it would be possible to configure it in such a way that the behaviour is possible?

You could (add an option to shelltestrunner to) turn multiline mode off; this allows you to match newlines with ., but no longer allows you to match the start/end of a line with ^/$ (they only match at the start/end of the whole string).

It's unfortunate that regex-tdfa has chosen such non-standard behaviour: merging "multiline" and "dotall" into one option + not matching newlines in complementing character classes (which AFAIK no other regex implementation does). Thus (optionally, if you want backwards compatibility) using a different regex implementation might be preferable.

Another option would be to (have an option to) "preprocess" the regex and replace . with (.|[[:space:]]) (though this is non-trivial); e.g. using a syntax like /.../s (similar to e.g. Perl and JavaScript).

simonmichael commented 3 years ago

Worth raising in regex-tdfa's issue tracker maybe ?

obfusk commented 3 years ago

Worth raising in regex-tdfa's issuentrwcker maybr ?

https://github.com/haskell-hvr/regex-tdfa/issues/11

ppenguin commented 3 years ago

You could (add an option to shelltestrunner to) turn multiline mode off; this allows you to match newlines with ., but no longer allows you to match the start/end of a line with ^/$ (they only match at the start/end of the whole string).

@simonmichael This might actually be a nice option to have as a command line option to shelltest which is probably easy to implement? Then one could simply choose the behaviour based on the use case. Multi-line off would be perfectly suitable for cli-testing where e.g. a program feedback is checked (e.g. the contents of a help or error message), since in many cases the keywords/patterns will be more important than the lines they're on.

obfusk commented 3 years ago

@ppenguin fwiw I recently quickly hacked together a Python implementation of something similar to shelltest. It's unfinished, not entirely compatible, only implements part of the functionality, hasn't been documented yet, and probably has some bugs. But it does support proper multiline tests (and uses Python's more extensive regex capabilities):

# test multiline matches
$ printf "Line 1 foo\nLine 2 bar\nLine 3 baz\n"
> /^line 1.*^line 2/ims

Note the /.../ims to enable case insensitive matching (i), multiline (m) & dotall (s).

teto commented 3 years ago

I would very much like this. I am currently porting my application from python to haskell and I dearly miss the integrated test generation ("transcript") of https://github.com/python-cmd2/cmd2 where you can save the output of the application in order to test it at a later date. I've just written one test with shelltestrunner but due to the size of the output it would be too impractical to maintain those transcripts manually. I mention this because it can be an inspiration for line handling too.

NB: I also find the expected output/command/get output quite hard to notice.

simonmichael commented 3 years ago

Would anybody like to propose/work on some improvements ?

iustin commented 3 years ago

Couldn't this problem (easy multiline matching) be solved by allowing multiple regexes per file descriptor? At least assuming that order of lines is not important.

I.e. I'm thinking of

printf "Line 1 foo\nLine 2 bar\nLine 3 baz\n"
>>> /Line 1/
>>> /Line 2 bar$/
>>>= 0

And one would need to resort to proper multiline matching only if specific order is needed.

simonmichael commented 2 years ago

Some years ago, regex-tdfa was the best compromise of power and portability. Is there anything better (more standard, more robust) nowadays ?