Open Tisten opened 2 years ago
Except for avoiding excess newlines, writing pointer payloads "inline" instead of in the end of the file make it much easier to read for a human.
Personally I like the "excessive" newlines and I find that easier to read. I however see your point on the pointers! How do you represent a pointer placed in line if it is pointed to more than once? Is the "ptr" : {} element some kind of marker and can have an ID? And in the Cap'n proto data I don't see other pointer-references, just a list?
Cap'n'proto flattens (i.e removes) all pointers except AnyPointers (unions) when going to json, so circular references doesn't work at all and all data gets duplicated. So to keep the structural integrety when pointers are referenced from multiple places they still need to be identifiable, i.e have a unique name or tag, and the inlined data could be written in either all or just one of the places, e.g where the first reference to the data is. If the data is flattened and written in all places then https://github.com/wc-duck/datalibrary/issues/14 could solve deduplicating it. And even if you would go that "flatten everything" route, you would still need to abort on cyclic references and have an idetifier to refer to.
I guess the same thing is true for arrays, but since they are already written without a unique name I guess that dl already flattens them even if they refer to the same pointer?
The two main points of the newlines is that:
It would be awesome if json formating could be made using formatting rules similar to "clang-format", so each user can choose their own style. The more I think of it, the more I think that reformatting the json is something which can be done after DL have created the json, i.e by pipe:ing the data to another tool. So DL could just avoid writing any whitespace, and let the formatting tool add all that. It would be slower, but if the data could be piped in chunks then formatting could mostly be done in parallel with DL's json generation, so a GB file would not require twice the time.
Yes, member-data alignment I wouldn't mind either. If what you mean with that is:
{
"member_1" : 1234,
"short" : 3456
}
also, I think vectors of numbers are single-line right? Because if they are not I think they should be.
but as you say... formatting is highly highly personal, so being able to pipe it via some kind of formatter might be the best solution. However the current api do not support streaming output and I think it would require quite a bit of new api that would probably "break" the current API-structure.
But an "unformatted" json output, would that just be no newlines at all, basically just a big long single line?
Yes, arrays of primitives and pointers are always single line, even when they are epic in length.
And yes, you understood the data-alignment correctly.
In my mind the unformatted style is just without any whitespace/newlines at all, the smallest memory footprint to start the reformatting from, no need to strip whitespace before adding new.
The implementation used by cap'n'proto to make the formatting simple is to use a "string tree", where all elements are leafs in the tree and then parented by the lists and objects owning them. The branches can provide the summed length of all its children, making it trivial to know which lists are appropriate to keep in one line, and which elements to insert newlines and indentation between. It makes it easy to insert sub-strings into the tree while building it and can also reduce the memory footprint since identical strings can be reused instead of duplicated.
Unfortunately it is terribly modern code, very big interfaces, very few lines in implementation and utterly impossible to understand by reading. Source here: https://github.com/capnproto/capnproto/blob/3b2e368cecc4b1419b40c5970d74a7a342224fac/c%2B%2B/src/kj/string-tree.h#L69 https://github.com/capnproto/capnproto/blob/3b2e368cecc4b1419b40c5970d74a7a342224fac/c%2B%2B/src/capnp/stringify.c%2B%2B#L57
When I compared dl to cap'n'proto, and the most striking thing cap'n'proto was better at was the awesome json formatting: Example cap'n'proto:
And the same in dl: