wooorm / markdown-rs

CommonMark compliant markdown parser in Rust with ASTs and extensions
https://docs.rs/markdown/1.0.0-alpha.18/markdown/
MIT License
836 stars 41 forks source link

Would you consider exposing a more structured error type? #108

Closed begleynk closed 2 months ago

begleynk commented 4 months ago

Would you consider exposing a more structured error type that gives users direct access to e.g. the positions of the originating errors? I'm trying to provide some custom error messages to users, but the Mdx errors are all Strings instead of something more structured.

To be more specific, I'd love to be able to give an error along these lines:

Expected a closing tag for `<Foo>`

1 | <Foo>
  | ^^^^^
  |   └ matching opening tag
2 |   Hello
3 | <Fo
  | ^^^
  |  └ expected closing tag
4 | More content...

In this case, the error string has the starting positions of both the opening and closing tag, but not the length/end point which would be needed to draw the lines under the offending tags.

If the error was instead something along the lines of...

pub enum Error {
  MdxClosingTagNotFound { tag: String, opening_tag_pos: Position, closing_tag_pos: Position }
  // Other error variants...
}

...it would allow for more customization of the errors.

The library could still have Display implementations for the error type for convenience and produce errors similar to how it does today.

Would you be open/interested in enabling something like this? Happy to take a look at implementing this if you'd like. I might need some pointers on where to start.

ChristianMurphy commented 4 months ago

Welcome @begleynk! Thanks for sharing your idea!

To preface, markdown-rs and the closely related unified js ecosystem, follow a variation on the Unix philosophy:

This module focuses on parsing valid markdown and mdx prioritizing (in order): following the CommonMark and MDX standards, being extensible to support many different use cases, and being fast and efficient.

To answer your questions:

begleynk commented 4 months ago

Thanks for the details!

Just to clarify - I agree that putting the rendering logic for the specific kind of error messages I want to show doesn't belong into markdown-rs.

What I'm really asking about is if you'd be interested in changing the return type for this function (and the _html equivalent) from Result<Node, String> to a more structured Result<Node, Error>, where the Error type would have more details about the error.

Right now the information about where exactly the error came from becomes harder to gather programmatically since it's a String,

Then the user would be able to customize the error presentation, wrapping the library for their own use case, and you could still have the Display implementation to give the current error message.

Does that make sense?

wooorm commented 4 months ago

to a more structured Result<Node, Error>, where the Error type would have more details about the error.

The string is already pretty structured, line, column, reason. What more do you need?

I think you should already be able to do what you want with that info. Perhaps it can be a “prettier” result, but what can be done with errors should be the same?

The reason I started simple is that this project is no_std + alloc, so it’s very minimal. And errors don’t exist in markdown itself, only in MDX. So pulling in a more complex project for errors will be unused by many users.

begleynk commented 4 months ago

The string is already pretty structured, line, column, reason. What more do you need?

I was hoping that e.g. get the starting position of the offending tag directly for the purpose of creating the custom error I described above. A workaround is using regex to pull them out of the error string.

I'm happy to go with that workaround for now if you feel there isn't much use for this outside our admittedly niche use case.

We just wanted to customize the error quite heavily so we were hoping to do this without resorting to regex, and figured this may be something others would be interested in.

Really impressed with the library. Thank you for creating and maintaining it ❤️

wooorm commented 4 months ago

<3

Well, you don’t need regex, this whole project doesn’t, should be a relatively small parse ;)

What info would you want? Is there more you’d want? Is a small struct with a point and reason enough?

begleynk commented 4 months ago

Well, you don’t need regex, this whole project doesn’t, should be a relatively small parse ;)

Fair point ;) And there's only 6 distinct error messages by my count in to_mdast.rs, so it's not a lot of work.

What info would you want? Is there more you’d want? Is a small struct with a point and reason enough?

That would be amazing. If possible it would be super useful to also have the end point of the tag being referenced, not just the start. But a brief glance at the code shows that info might not be available where those errors are generated?

Also some errors also reference two different tags (i.e. the open and close tag) - it would be great to have positions of both (hence my suggestion for the enum to describe the different cases).

But feel free to close this issue, unless you feel there's value in keeping it around.

wooorm commented 4 months ago

There are also errors thrown from the JavaScript parser that is passed in. As that parser is arbitrary, those errors are also arbitrary. to_mdast is one part, there are also many “parse errors”, see for example https://github.com/wooorm/markdown-rs/blob/60db8e5896be05d23137a6bdb806e63519171f9e/tests/mdx_jsx_flow.rs#L132.

I’m open to a PR that uses a simple structure, reason and point. If and when you work then you can investigate whether positions, and actually two positions, makes sense. I think it’s only very few cases where that info exists.

begleynk commented 4 months ago

Ah right got it. Thanks for the pointers. The JS parser may make things a bit more complicated I guess if you want to support arbitrary errors - String kind of make sense there.

I'll start with parsing the errors myself for now. Happy to report back later how that goes.

Thanks again!