naomijub / edn-rs

[DEPRECATED]: Crate to parse and emit EDN
https://crates.io/crates/edn-rs
MIT License
81 stars 14 forks source link

Add context and cause to parse error messages #70

Closed jcsoo closed 4 years ago

jcsoo commented 4 years ago

Currently, parsing errors are not useful - most simply return a string {} could not be parsed without any context. This requires bisecting the input to identify the line that is causing the problem, then reading the source to guess at the actual cause.

Ideally the Error type would include a line number and character number along with an ErrorType enum uniquely specifying the cause of the failure, but even updating the error string to include the line number and a more descriptive message would be very helpful.

naomijub commented 4 years ago

It returns the symbol that broke. We don't identify edn by line, so we cannot include the line that the error occurred. As of now, I cannot include lines, but maybe in the future I can take a look at this.

What we could add is the following x chars, does this help?

naomijub commented 4 years ago

But you are welcome to try to include failure line

naomijub commented 4 years ago

@jcsoo So I got an idea to help but it affected significantly performance. The idea is to use enumerate to have the char count that the problem started. Does this help you?

jcsoo commented 4 years ago

One thing you may be able to do is to use the .as_str() method of Chars to get the subslice of remaining characters, use .as_ptr() to get the pointer to the beginning of that slice and then find the offset of that pointer relative to the full string, which you will have to pass to your parsing functions or store in a thread-local variable. That will tell you the number of bytes consumed up to that point. You can then iterate through the full string to calculate the line number and column number where the error occurred.

The advantage of this approach is that there should be no performance overhead unless if you actually have an error.

naomijub commented 4 years ago

This follows that same error pattern that breaking the lines at break lines, like \n, would cause. So calculating the line from this approach could generate distortions. But I could return something like a subset of the line.

naomijub commented 4 years ago

@jcsoo So I did the change to use enumerate as it was the best option as of now. Could you test please? https://github.com/naomijub/edn-rs/pull/73