Open ppenguin opened 3 years ago
Sorry I'm not sure - needs debugging. Perhaps we are not calling it in multiline mode.
It is called in multiline mode. But regex-tdfa has a non-standard multiline mode that combines what is usually known as "multiline" with inverse "dotall" and also disables matching newlines in inverted character classes (so you can't even use e.g. [^!]
).
You can match a newline using "(.|\n)"
, but only with an actual newline in the pattern (since \n
is just n
to regex-tdfa). I don't think that shelltestrunner can currenly do that for you.
Also: fyi echo -e
doesn't usually work with /bin/sh
.
regex-tdfa does recognise [[:space:]]
though, so this works:
# test multiline matches
$ printf "Line 1 blabla\nLine 2 haha\nLine 3 hihihi\n"
> /.*Line 1(.|[[:space:]])*Line 2.*/
>= 0
Also: fyi
echo -e
doesn't usually work with/bin/sh
.
# with bash
$ echo "foo\nbar"
foo\nbar
$ echo -e "foo\nbar"
foo
bar
$ printf "foo\nbar\n"
foo
bar
# with /bin/sh (on my system)
$ echo "foo\nbar"
foo
bar
$ echo -e "foo\nbar"
-e foo
bar
$ printf "foo\nbar\n"
foo
bar
The behaviour of echo
regarding escapes and options differs greatly between systems.
I recommend using printf
instead (though you need to manually add a \n
at the end).
@obfusk Thanks for the infos, very useful.
Indeed on my sh
echo -e
works as expected, but since in some cases I need compatibility with e.g. busybox
etc, it's still a valuable comment which I will use.
As for the [[:space:]]
workaround, very useful! I guess this greatly lessens at least the urgency of this issue.
What I didn't 100% understand is whether we should abandon the expectation to handle newlines in a standard way completely due to inherent limitations of regex-tdfa
, or whether it would be possible to configure it in such a way that the behaviour is possible?
What I didn't 100% understand is whether we should abandon the expectation to handle newlines in a standard way completely due to inherent limitations of
regex-tdfa
, or whether it would be possible to configure it in such a way that the behaviour is possible?
You could (add an option to shelltestrunner to) turn multiline mode off; this allows you to match newlines with .
, but no longer allows you to match the start/end of a line with ^
/$
(they only match at the start/end of the whole string).
It's unfortunate that regex-tdfa
has chosen such non-standard behaviour: merging "multiline" and "dotall" into one option + not matching newlines in complementing character classes (which AFAIK no other regex implementation does). Thus (optionally, if you want backwards compatibility) using a different regex implementation might be preferable.
Another option would be to (have an option to) "preprocess" the regex and replace .
with (.|[[:space:]])
(though this is non-trivial); e.g. using a syntax like /.../s
(similar to e.g. Perl and JavaScript).
Worth raising in regex-tdfa's issue tracker maybe ?
Worth raising in regex-tdfa's issuentrwcker maybr ?
You could (add an option to shelltestrunner to) turn multiline mode off; this allows you to match newlines with
.
, but no longer allows you to match the start/end of a line with^
/$
(they only match at the start/end of the whole string).
@simonmichael
This might actually be a nice option to have as a command line option to shelltest
which is probably easy to implement?
Then one could simply choose the behaviour based on the use case.
Multi-line off would be perfectly suitable for cli-testing where e.g. a program feedback is checked (e.g. the contents of a help or error message), since in many cases the keywords/patterns will be more important than the lines they're on.
@ppenguin fwiw I recently quickly hacked together a Python implementation of something similar to shelltest. It's unfinished, not entirely compatible, only implements part of the functionality, hasn't been documented yet, and probably has some bugs. But it does support proper multiline tests (and uses Python's more extensive regex capabilities):
# test multiline matches
$ printf "Line 1 foo\nLine 2 bar\nLine 3 baz\n"
> /^line 1.*^line 2/ims
Note the /.../ims
to enable case insensitive matching (i), multiline (m) & dotall (s).
I would very much like this. I am currently porting my application from python to haskell and I dearly miss the integrated test generation ("transcript") of https://github.com/python-cmd2/cmd2 where you can save the output of the application in order to test it at a later date. I've just written one test with shelltestrunner but due to the size of the output it would be too impractical to maintain those transcripts manually. I mention this because it can be an inspiration for line handling too.
NB: I also find the expected output
/command
/get output
quite hard to notice.
Would anybody like to propose/work on some improvements ?
Couldn't this problem (easy multiline matching) be solved by allowing multiple regexes per file descriptor? At least assuming that order of lines is not important.
I.e. I'm thinking of
printf "Line 1 foo\nLine 2 bar\nLine 3 baz\n"
>>> /Line 1/
>>> /Line 2 bar$/
>>>= 0
And one would need to resort to proper multiline matching only if specific order is needed.
Some years ago, regex-tdfa was the best compromise of power and portability. Is there anything better (more standard, more robust) nowadays ?
When testing this, it matches:
but when testing this, it doesn't (returns failure):
the
regex.TDFA
matcher is supposed to default to multiline, so I'd expect that one to work. But even if I explicitly try to include newlines for the catch-all, it doesn't work as well (also returns failure):Is there a syntax I can use to achieve multiline matching as-is, or does it require a mod to the code?