pchampin / sophia_rs

Sophia: a Rust toolkit for RDF and Linked Data
Other
214 stars 23 forks source link

Provide a simpler parser API #25

Closed MattesWhite closed 4 years ago

MattesWhite commented 4 years ago

This issue contributes to #23

For me the parser API of sophia is a bit confusing with its macros and Config.

I suggest the introduction of a Parser-trait:

pub trait Parser {
    fn parse_str<'src>(&'src mut self, input: &'src str) -> 
        Result<Box<dyn Iterator<Item=Result<[Term<Cow<'src, str>>; 3]>> + 'src>>;
    fn parse_read<'src, R: 'src + Read>(&'src mut self, input: &mut R) -> 
        Result<Box<dyn Iterator<Item=Result<[Term<Cow<'src, str>>; 3]>> + 'src>>;
}

For Quads a similar trait respectively

An implementation for the N-Triples parser, e.g.:

mod nt {
  #derive[default]
  struct Parser {
    strict: bool;
  }

  impl Parser {
    fn new_strict() -> Self { Self { strict: true, } }
  }

  impl Parser for Parser {
    ...
  }
}
pchampin commented 4 years ago

Re. traits vs. macros, big +1. Initially, I wanted to define a Parser trait, but was unabled to complete it and opted instead for the "abstract interface" + macros approach, but I am not entierly satisfied with it. So thanks for prompting me, I'll give it a try again.

Leaving aside the trait vs. macro problem, what you propose above is actually very close to the current API, expect that what you call Parser is currently called Config. And this was a deliberate choice, because the actual parser, in my view, is only created when you call parse_str or parse_read. You mention "storing state so they can lazily parse" (which I think is what current implementations are doing), but such state (e.g. prefix bindings in Turtle) should not be shared across multiple calls of parse_X. The only thing that is really persistent here is, therefore, the parsing configuration, hence the current name.

That being said, I agree that this subtle semantic distinctions may bring more confusion than clarity. So we might as well bite the bullet and use the name everyone expects.

To sum it up: my plan is to replace the current Config "protocol" by a Parser trait, offering roughly the same interface.

pchampin commented 4 years ago

The discussions about #8 may have impact on this issue. We actually have two options:

  1. define a dedicated sophia::parser::Error which would replace the ParserError variant of the current global error type (with its location attribute to know where the error happened), or
  2. let each parser have their own error type, exposed through an associated type, just like Graph and Dataset.

The second option is more light-weight, but then we lose the ability to uniformly get the location of the error. This could be mitigated by requiring Parser::Error to implement a WithLocation trait that would be defined in the parser module...

I think I like option 2 + WithLocation trait better. Any opinion?

MattesWhite commented 4 years ago

Maybe have a look at the nom crate. They spent quiet efford into improving parser errors in their last major update.

Can't provide further material as I'm on mobile

pchampin commented 4 years ago

Commit 9af95c5 is an attempt to address this issue. @MattesWhite , let me know if you are satisfied with it, and if so, feel free to close this issue.

Note that parsers are almost rid of the global error types; only the RDF/XML parser still depends on it, because changing that would involve a substantial refactoring...

MattesWhite commented 4 years ago

Nice work this should satisfy the most use cases.

The only thing that bothers me is the WithLocation-trait. I'm not sure if it is realy needed. All relevant parsers I know (nom, pest, rio) provide their own way of specifing location of an error. Each way differs somewhat but they are part of an error type and their textual representation. In addition, I think the information is only relevant for RDF that is written by hand. RDF that was created by a seriallizer shouldn't contain formal errors and implementing a serializer that corrects its serialization based on errors from a parser seems not to be realistic. In fact, the only one that benefits from locating an error are humans. As a result, there is no need to include Location as part of the parser-api as a good parser will include it anyway.

An alternative to removing WithLocation would be to introduce something like DisplayWithSource which would print return a pretty print of the wrong area with further information (cf the chic-crate) (kinda the rustc error messages). However, this is only realy applicable for String-based parsers.

pchampin commented 4 years ago

All relevant parsers I know (nom, pest, rio) provide their own way of specifing location of an error.

They all provide it, so it is reasonable to require this from any parser (and anyway, Location has an Unknown variant for the rare cases where this information would not be available).

They all provide it in their own way, so there is some value in unifying that with a unique trait (see more below).

In fact, the only one that benefits from locating an error are humans.

I could not disagree more. Take for example any good text editor or IDE: they use this information to provide highlighting and navigation. Ok, they are usually parsing textual representations to extract the location information, but if they could get it more directly, why prevent them to do so?

So sorry, but I still vote to keep WithLocation... And that does not preclude using something like chic for generating higher-level, user-facing errors. On the contrary, the information extracted from WithLocation will be useful in this scenario.

MattesWhite commented 4 years ago

Take for example any good text editor or IDE

Okay, I missed that use-case. It is okay for me to keep WithLocation. I only wanted to express that this is maybe a bit to much of unification.

To sum up, the design of the new Parser-trait is okay and definetly an improvement compared to the macro-solution before. Therefore, I consider this issue resolved. I will now start on refactoring the error handling of the XML parser.

pchampin commented 4 years ago

Thanks :)