uhop / stream-json

The micro-library of Node.js stream components for creating custom JSON processing pipelines with a minimal memory footprint. It can parse JSON files far exceeding available memory streaming individual primitives using a SAX-inspired API.
Other
960 stars 47 forks source link

Output Error position when reporting the error form the Parser #131

Closed adelbke closed 1 year ago

adelbke commented 1 year ago

I noticed that the errors outputted by the parser do not mention the position of the parsing error and I think it should.

I'm ready to contribute to make this improvement but I lack the skills to do so. I know that the intervention that needs to be done is in the Parser class and found this occurence

case 'objectStop':
          patterns.comma.lastIndex = index;
          match = patterns.comma.exec(this._buffer);
          if (!match) {
            if (index < this._buffer.length || this._done) return callback(new Error("Parser cannot parse input: expected ','"));
            break main; // wait for more input
          }

the error passed in the callback doesn't contain the position

uhop commented 1 year ago

Parser is supposed to deal with valid JSON files, so it does not include a position of the parsing error.

Because the main goal of the package is to deal with huge files (otherwise users can use JSON.parse()):

So unless the file in question was produced by stream-json I suggest dealing with tools used to produce it.

Yet there is a thoroughly documented helper to verify that an input stream is a valid JSON: Verifier.

The topic of an exact position in a huge file was discussed before. You may find this discussion useful, if a user cannot open a file with an editor yet wants to see where the error has occurred.

PS: Parser's documentation mentions Verifier and its purpose right at the top. What went wrong? How can I improve the documentation to help you and other users to understand this feature better?

adelbke commented 1 year ago

My apologies, I guess I missed the Verifier and didn't quite understand it properly.

I'm still learning how to handle streams in Javascript hence my lack of expertise.

Unfortunately I am parsing a file that is not produced by me and therefore have no control on the correctness. The only reason I ended up having this issue is because someone forgot a comma and submitted the JSON file to me that way and it broke my code. I just want to handle this to make it more robust.

I would also appreciate being able to let people know how they're making mistakes when submitting their JSON files and since I have your attention and would like to contribute then. I would like to offer the option on the Verifier to have a more human readable error that shows the line/column position.

I saw that you offerred a decent solution here that I can work on.

So, can I start working on a PR ? and if so are there any guidelines on how I can edit the Verifier class ?

uhop commented 1 year ago

I am not sure that Verifier is the right place for it. The problem of bad files is generally unsolvable. For example, I mentioned one case I dealt with before: https://github.com/uhop/stream-json/issues/110#issuecomment-1134906599

While Verifier returns a line/column position as well as an offset, I am leaning towards a set of utilities to deal with the problems and, most importantly, fix them, rather than do everything in one go as stream components do. But all problems are different and so far not generic enough. That's why I didn't open-source utilities I wrote for my case described in the link above.

Likewise, I don't want to make extensive changes to a simple utility just because someone generates JSON manually. Today it is a missing comma, tomorrow, it is a dot instead of a comma, a trivially unmatched braces, and so on. We cannot possibly cover all cases. Personally, I feel that Verifier with its trivial true/false determinism is enough to verify a stream.

In general, I would suggest working on a tool you need anyway, then evaluating if it is needed to be open-sourced. If it is generic enough it can be open-sourced in your own repository and I can reference it from the wiki. If the patch is trivial and generally useful, I can consider it for adoption. Anything accepted becomes my responsibility and I commit to supporting it, so I hope you understand why I am hesitant to extend the project with rare use cases.