rust-bakery / nom

Rust parser combinator framework
MIT License
9.45k stars 804 forks source link

`convert_error` assumes fixed-width characters #1201

Open SPuntte opened 4 years ago

SPuntte commented 4 years ago

Hi,

a nom and Rust newbie here. Thanks for developing such a great library :pray:

Recently, I have been tinkering with parsing a file format containing tab characters. It just so happens that tabs can make convert_error output really confusing.

A minimal example (using nom 6.0.0-alpha3),

use nom::{
    bytes::complete::tag,
    character::complete::space1,
    error::{convert_error, ParseError, VerboseError},
    sequence::delimited,
    Err, IResult,
};

fn foobar<'a, E: ParseError<&'a str>>(input: &'a str) -> IResult<&'a str, &'a str, E> {
    delimited(tag("foo"), space1, tag("bar"))(input)
}

fn main() {
    let input = "foo\tbaz";

    match foobar::<VerboseError<&str>>(input) {
        Err(Err::Error(e)) => {
            println!("{}", convert_error(input, e));
        }
        _ => {}
    }
}

produces (YMMV, depending on terminal configuration)

0: at line 1, in Tag:
foo     baz
    ^

That is, convert_error assumes every output character (including tabs) to be one column wide.

This particular example isn't too bad yet. Given a more complex sequence of tokens after a tab character, you can probably see how misleading it can get, though.

Sure, I'd like my use case to "just work". However, looking at how the problem is solved in, say rustc1, it seems somewhat nontrivial. As a newcomer I really can't tell if a typical user needs nom to provide high-quality, human-readable error traces at all. Maybe you can tell me if it's something you want to see nom handling in the future?

IMHO, it would be nice to at least mention this assumption in the documentation and/or examples.

As said, I'm not too confident of a rustacean yet but I would like to get into contributing if this is something I can help with :)


[1] See NonNarrowChar, Loc, source_map::SourceMap::lookup_char_pos in rustc_span.

Geal commented 4 years ago

it would be good to have convert_error support this use case, but I'm not sure I have the bandwidth to work on this? Could you send a PR?

SPuntte commented 3 years ago

Hey, thanks for the reply. I will try hacking something together. However, being busy with other things™ too, it might take me a while.

Since last time, I did a little bit of research discovering codespan-reporting and annotate-snippets both of which have, unsurprisingly, encountered effectively the exact same issue already. They also seem to solve the problem of error reporting pretty well. Thus, I infer that it makes little sense for nom to delve too deep into that problem space, as someone using nom as a parser in a more complex application will likely set up user-facing error reporting using such library anyway, avoiding convert_error altogether.

For simpler use cases, such as mine, I still see the value of the convert_error trace. With my current knowledge, I would suggest nom to take the rustc route, replacing tabs in the input with a fixed amount of spaces. Alternatively, one could try something more sophisticated, e.g. via unicode-width, but that would add a dependency and yet it wouldn't be a 100% solution.

Any thoughts or comments are warmly welcome.