rust-lang / regex

An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.
https://docs.rs/regex
Apache License 2.0
3.54k stars 443 forks source link

Multiline error messages are bad for logs #683

Closed vorner closed 1 year ago

vorner commented 4 years ago

Hello

If there's an error in syntax of regular expression, the Display of the Error tries to be helpful and outputs the original pattern with an arrow pointing to the place where it happened.

use regex::Regex;

fn main() {
    let e = Regex::new("This is wrong )[").unwrap_err();
    println!("There's an error: {}", e);
}
There's an error: regex parse error:
    This is wrong )[
                  ^
error: unopened group

However, I feel like being multiline is not something one would expect from error types. From a more practical perspective, this makes these errors mess up log files, where it is a rule of thumb to never write anything multiline to allow easy grepping through it.

I don't have an idea what exactly to do about the situation ‒ I understand some might like the helpful multiline messages, so removing them completely probably is not the best option, but I'd also like to be able to output errors to logs ‒ and do that in generic way, without knowing what exact error it is, or even have it as one of the error cause.

BurntSushi commented 4 years ago

I don't necessarily have a problem with doing something that allows one to print an error message on a single line, but I don't think I'm willing to prioritize the log case over the better UX in the common case.

Personally, if it were me, I'd probably just split the error message into lines, drop everything but the last line and log that. I'm not sure the regex crate itself needs to do anything for that. I'm possibly open to other ideas.

vorner commented 4 years ago

Well, the problem with that is that:

So I was thinking more in the lines of regex::Error::single_line(self) -> Self method that would „configure“ the error. It doesn't sound exactly right either, though so I hoped someone could come up with a better option.

kchibisov commented 4 years ago

Personally, if it were me, I'd probably just split the error message into lines, drop everything but the last line and log that. I'm not sure the regex crate itself needs to do anything for that. I'm possibly open to other ideas.

Well the main issues here is that strategy to propagate errors is not useful for downstream to provide proper error messages that makes sense for the users, especially, when you try to throw them in GUI applications. Default formatting is bad, but it's subjective, but that's not how you can solve/improve that.

The main issue is that the information that Error is carrying can't be used by downstream other than just reading the actual debug impl and trying to parse everything from that String. Ideally Syntax error should send 'location' (offset in regex), 'kind of syntax error' (Information of what syntax error we've reached) and 'original regex string'. With those information, you can actually generate accurate error message by yourself quite easily.

So it should look more like.

pub enum Error {
    /// A syntax error.
    Syntax(String, usize, String),
    /// The compiled program exceeded the set size limit.
    /// The argument is the size limit imposed.
    CompiledTooBig(usize),
    /// Hints that destructuring should not be exhaustive.
    ///
    /// This enum may grow additional variants, so this makes sure clients
    /// don't count on exhaustive matching. (Otherwise, adding a new variant
    /// could break existing code.)
    #[doc(hidden)]
    __Nonexhaustive,
}

Where the first element in Syntax error is a source we were using to build the regex, the second is an offset of a syntax error in the first element, and the third is kind of that error, which could be some enum, like SyntaxErrorKind, but String is likely enough. Well, the first element might be redundant, but I guess nothing stops from sending it to.

Note, even ripgrep could benefit from that, so you can print error: part in red, so the actual error message will be more noticeable.

~ ➜ rg "hello("
regex parse error:
    hello(
         ^
error: unclosed group
kchibisov commented 4 years ago

Oh, in addition, if default formatting should be changed, I guess library should print a bit more boring debug formatting. Right now it looks like that.

Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
    hello(
         ^
error: unclosed group
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)

However something like this makes a bit more sense for a library.

Syntax(source: "hello", offset: 5, kind: "unclosed group")
BurntSushi commented 4 years ago

The main issue is that the information that Error is carrying can't be used by downstream other than just reading the actual debug impl and trying to parse everything from that String. Ideally Syntax error should send 'location' (offset in regex), 'kind of syntax error' (Information of what syntax error we've reached) and 'original regex string'. With those information, you can actually generate accurate error message by yourself quite easily.

You can do that today using regex-syntax. Otherwise, this seems like a separate issue from the one described by the OP. I've already said that changing the default formatting is off the table. The entire point of the multi-line error message is that it is the default. If it wasn't the default, it would be almost pointless because folks aren't going to know to opt into it.

If you really want to make an earnest argument against the default formatting, then please open another issue. Otherwise, I don't see a reason to continue down that path.

So I was thinking more in the lines of regex::Error::single_line(self) -> Self method that would „configure“ the error. It doesn't sound exactly right either, though so I hoped someone could come up with a better option.

@vorner This sounds like something I might be open to, but to be honest, it seems pretty ham-fisted to me. If you have a requirement that log messages are all one line, then it seems to me like you need to enforce that requirement on your end.

BurntSushi commented 1 year ago

I'm going to close this for now.

I remain open to the idea of exposing some way of emitting single-line error messages. A mutator on the error type itself kind of seems non-ideal. Another possibility is a global flag or perhaps even an option on RegexBuilder, which appeals more to me personally. If someone wants to make a more flushed out proposal, I'd be happy to revisit this.