Open Minoru opened 5 years ago
we need to keep compatibility with current regex syntax, i.e. title =~ "regex" should keep working. I've no idea what syntax we use at the moment, though.
We're using the POSIX extended regex, that's inconvenient because it looks like rusts regex
crate syntax is: "similar to Perl-style regular expressions [...] Much of the syntax and implementation is inspired by RE2."
.
I couldn't find anything use-able in rust that implements posix regex. I guess that leaves us to make our own wrapper around reg{comp,exec,error,free}
.
Spent about an hour fiddling with a test for Parser::Errors::SynErr()
. My takeway is that it's pointless. A few errors can be triggered in an obvious way, e.g. "closeblock expected" is triggered by (a = "a"
. Other errors seem to be less obviously connected to the input: e.g. a = "a" xor
results in "EOF expected", not "invalid logop" as one might expect. Other errors, like "\"between\" expected" and "??? expected", seem to be some internal stuff, which I have no idea how to trigger.
This means it's not a requirement for the new filter parser to report errors exactly the same way as the existing parser does.
@Minoru What's next?
@pickfire The next, and the last, thing is to finish #701. I'm still working on it.
As my comments in #701 show, existing filter parser and matcher both have interesting corner cases, and I took this rewrite as an opportunity to fix them. Beyond any doubt, the new implementation has its own corner cases, and also bugs. Thus, it would be best if we don't just switch over to the new stuff; instead, there should be some grace period when two implementations co-exist, and their results are compared. Any discrepancies in the results should be reported to us and either fixed or explained. The question is: how do we do it?
We need a solution that:
@tsipinakis, @noctux, and me discussed this on IRC a while back, and came up with the following ideas:
I had some counter-points to each, but re-thinking it now, I came up with the following:
Matcher
class keeps a static std::set<std::string>
which is a set of errors. Each error mentions its kind and the filter expression that caused it. There are two possible kinds: Rust and C++ disagreed on compilation (one compiled, the other errored), or Rust and C++ disagreed on the result (one thinks it matches, the other thinks it doesn't)
The size of that std::set
is bounded by how many different filter expression a user has. It's virtually impossible for someone to have so many expressions that errors stop fitting into memory.
If the set of errors in not empty when the program finishes:
With that, the overall plan for this issue is as follows:
FilterParser
and Matcher
. Call them RustFilterParser
and RustMatcher
perhaps?Stages 4 and 6 might span multiple releases if we can't keep up with the bug reports.
Anyone seeing any holes in this?
wait for Enter or EOF
Or call libc::isatty()
on stdin to see if there's any point in asking for confirmation.
Steps 4 and 5 also give us time to work on error messages. For my initial PR, the only error-related requirement is for the parser to report existence of errors. To make Rust parser the default one, we'll have to make its messages useful to the users.
After a bit more thinking and a prod from @AnthonyMikh, I realized that we can use fuzzing and property testing to compare parsers/matchers! This is much more attractive approach: it offers faster feedback; it doesn't inconvenience users; and finally, it's personally interesting to me :)
Use fuzzing to check that parsers react the same way for each possible input. Since our new parser doesn't (or, at least, shouldn't) reject any inputs that the previous one didn't, we should always see both parsers succeeding or failing simultaneously.
We can't directly compare the resulting parse trees, though, because the new parser "remembers" types of RHS arguments, so e.g. age > 007
effectively becomes age > 7
in the new parser, while the old one stores "007" as-is. I think we could use the information from the new parser's tree to convert the info of the old tree into an appropriately typed value, and compare that.
Use QuickCheck-style property testing to generate well-formed filter expressions and fill a MockMatchable with some data, then test that both matchers return the same result.
But wait, won't that give us a ton of false positives as the suite finds all the things we intentionally broke, like comparing a string to a number? Well, here's the trick: we'll introduce new errors for that. Right now, MatcherError
only contains two possible errors, but we can add "type mismatch" to that list (we'll need it anyway, I just thought I'd add it after comparing the implementation). The test will check for that error, and if it pops up, it won't fail because it's clearly an intended behaviour of new parser.
VerboseParseError
type and how to translate it into a useful message like "found =!
, expected one of: =
, ==
, !=
, =~
, !~
".title = 42
. As mentioned in #900, current Rust Matcher assumes that the expression is well-formed and the user is not trying to compare incompatible types (like string and number in title = 42
). We can't make this assumption in real life, so we have to:
MatcherError
called "type mismatch" or something;I'm still busy investigating #232, but I do plan to get back to this eventually. But I won't mind if in the meantime someone works on any of this as well :)
Upon re-reading this issue, it becomes clear to me that it was a mistake to make the new parser store numbers as integers. We should've mimicked the old parser's behaviour, and store them as strings. That would make the results directly comparable, making it almost trivial to check for regressions. I am now working on a PR that undoes this mistake.
I still believe in everything I wrote in https://github.com/newsboat/newsboat/pull/701#issuecomment-593002181, but I think the parser is the wrong place to introduce the types. We should start from the other end, i.e. Matchable
, because that allows us to make the change in smaller steps without breaking anything. I filed #1053 about this.
make Rust FilterParser log how long it took to parse the filter make Rust Matcher log how long it took to check for match
Looks like some sort of performance log, I believe I can look into this. (it may take some time before I have some time to work on this)
Don't reserve it if you aren't going to work on it just yet ;)
Newsboat can show and hide items that match certain expressions, like
author == "Mark Twain"
ortitle =~ "C++"
. These expressions are parsed withFilterParser
class, andMatcher
performs the check.FilterParser
is currently implemented using Coco/R. The syntax of filter expressions is described by filter/filter.atg (docs for the format are in section 1.2 of Coco's user manual). I believe it can be re-implemented with nom crate, which we already use for a number of tasks; see rust/libnewsboat/src/fmtstrformatter/parser.rs for inspiration.Note that
FilterParser
's test coverage is a bit incomplete; I'd like to seeParser::Errors::SynErr()
tested as well, because we probably use it throughFilterParser::get_error()
(but you should check and prove me wrong if you can). The tests are in test/filterparser.cpp.Normally I'd ask to port one class at a time, but in this case it seems summarily faster and easier to simultaneously port
Matcher
as well.Matcher
walks the tree built byFilterParser
, and even though it's possible to do this over FFI, it requires writing a lot of bindings; for each of six node types, we'll need two functions. If we portMatcher
as well, we can useMatchable
over FFI, which is only two functions.Matcher
needs one more test, to trigger this error condition.If all of the above doesn't sound intimidating enough, here's one more constraint: we need to keep compatibility with current regex syntax, i.e.
title =~ "regex"
should keep working. I've no idea what syntax we use at the moment, though.Rough TODO for the implementer (comment here if something's unclear or missing):
add C++ tests forParser::Errors::SynErr()
throughFilterParser::get_error()
(test/filterparser.cpp)FilterParser
in Rust, preferably using nom crate (create rust/libnewsboat/src/filterparser.rs) (#900)FilterParser
tests to Rust (#900)Matcher
(test/matcher.cpp)Matchable
trait to Rust (#900)Matcher
to Rust (create rust/libnewsboat/src/matchable.rs and rust/libnewsboat/src/matcher.rs).Matcher::matches()
should take a trait object of Matchable (#900)Matcher
tests to Rust (#900)Matchable
:extern "C"
functions that callhas_attribute
andget_attribute
(add include/rs_matchable.h and src/rs_matchable.cpp)struct ForeignMatchable
that implements traitMatchable
(add rust/libnewsboat-ffi/src/foreignmatchable.rs), calling the aforementioned two C++ functionsMatcher
(add rust/libnewsboat-ffi/src/matcher.rs)Matcher
to use FFI instead of C++FilterParser
FilterParser
: