ibireme / yyjson

The fastest JSON library in C
https://ibireme.github.io/yyjson/doc/doxygen/html/
MIT License
3.04k stars 262 forks source link

Add (optional) support for preserving and round-tripping comments. #121

Open TkTech opened 1 year ago

TkTech commented 1 year ago

More or less summed up by the title :)

There are quite a few (very) popular projects like VSCode that store their configuration files in JSON with comments. yyjson can currently read these with no problem by enabling comments, but this simply strips them.

It would be ideal if these could (optionally) be stored and serialized to support round-tripping of these types of configuration files. It seems unreasonable to expect byte-perfect round tripping, since yyjson wasn't built for that and it would likely be too much work.

I see 2 options that appear reasonable, IMO:

  1. Add a comments pointer to yyjson_mut_val, which is simply a character blob that gets prepended when serializing that yyjson_mut_val (and so must include the "//" in front of each line in the string). Existing code wouldn't break, and code that doesn't care about comments could simply ignore it. When the parser encounters a comment, just start storing it up until the next yyjson_val is a found and then add it. As an edge case, if a comment occurs outside the documents root, it becomes a comment on the root node (even if it occurred after the root).
  2. Add a YYJSON_TYPE_COMMENT which is added to the document. This feels more difficult to get right. How would you add it to an object for example? It has no key.
ibireme commented 1 year ago

Hi TkTech, thank you for your suggestion.

I also wanted to preserve comments before but found it difficult to implement. Adding a new TYPE would not work, because comments are not structured data, and this would break the function of get_len() and iterator. Adding a new member to val would work, but this would increase memory usage and decrease performance. I think it is not worth it to preserve comments at the cost of performance.

TkTech commented 1 year ago

I agree with you for the general use case, so how about as a compile-time flag that is disabled by default? My example use case is a VSCode linter. In such a case I don't care about peak performance, but the flexibility and correctness of yyjson's parser and its support for _RAW types and comments.

I believe it would still perform very well, in real time from the perspective of a slow human brain.

ibireme commented 1 year ago

Maybe I can try it out and see how it actually performs.

TkTech commented 1 year ago

If you don't have a preference for how this is done, I'm happy to take a stab at a proof-of-concept PR.

ibireme commented 1 year ago

If you don't have a preference for how this is done, I'm happy to take a stab at a proof-of-concept PR.

Thanks. I would be happy to take a look. Sorry for the slow response recently.

TkTech commented 1 year ago

No rush mate, you don't owe anyone anything:)