Open tracker1 opened 3 years ago
Made several edits to first post for clarification and some simplification... hope this makes sense and would help for defining the expected behavior.
What a nice notification to receive a WFH friday morning :)
This is a great suggestion, and a lot more fleshed out than the few words I posted. I have a few points that I'd like to bring up:
If this API is intended to be public, and I think it should be, might we want to use a more developer friendly name for the ___COMMENTS___
property? Generally speaking I don't seem to see triple underscores being used a lot in the JS community, and my first thought was that it was a double underscore. A common convention is to use a single underscore to indicate a private property (_comments
), but I can see this causing a naming conflict. Perhaps the middleground, double underscores?
I guess the safest way of doing it would be with symbols (myToml[TOML.Comments]
), but that would pose issues with both serializing and supporting older versions of Node.
For multiple interspersed comments/lines, the first block with a blank line after, and no blank after the node will belong to the previous node's AFTER, all additional nodes will be part of the next node's BEFORE, The document nodes will be before/after as such.
How would the following TOML be handled in that case?
# Comment 1
# Comment 3
[foo]
bar = 123
# Comment 4
# Comment 5
# Comment 6
I'm specifically concerned with the last comments (5 and 6). Since there's no next node, will they be part of bar
's AFTER
, or instead be part of ___DOCUMENT___
's AFTER
?
Continuing on my last point, is there a risk that the AFTER
property will cause confusion? Take the following TOML:
[foo]
# This section contains fooy things
amount = 2
The comment obviously belongs to the section [foo]
and will en up in foo.AFTER
. Now, what about the following TOML:
[foo]
# This section contains fooy things
amount = 2
If I've understood your proposal correctly the comment will now end up inside amount.BEFORE
instead.
I see a potential of a lot of these edge cases coming up, and many of them will likely be because us humans can take a decision based on context, while a piece of code can't.
This last point is about how we will handle multiline strings. I just want to make sure that I understand how it would work.
In the case of this string: Hi,\n \nThis is a new line
the following TOML would be produced:
# Hi,
#
# This is a new line
And in the case of this string: Hi,\n\nThis is a new line
the following TOML would be produced:
# Hi,
# This is a new line
Would that be correct? If so, then this string: A\n\nB\n\nC
would result in this:
# A
# B
# C
But the issue then would be that it would be deserialized into two separate strings: A
and B\n\nC
, right? I might be all wrong here, just want to make it clear.
That turned into more than I initially thoughts. Anyways, I'm looking forward to hearing what you have to say :smile:
enumerable: false
so it won't serialize via JSON.stringify or the like by default (same for loggers)... the override option is only for changing this behavior. The triple undersores is to expressly avoid potentially conflicting.bar
would leave 4-6 as part of document's afterAnd yes, the detection could be flawed, that said, should be able to serialize/deserialze the comments relatively easily... modifying comments programatically, a bit harder.
For me the biggest issue is being able to load, edit a value, and save with comments, so my config files wouldn't lose instructional comments.
I see a lot of value in this, even if there are some edge cases that are strange. Compared to having all comments removed, having some level of comment preservation (even if lossy), would be hugely valuable!
I can see the appeal of trying to make it lossless in both directions, but that doesn't seem to be a goal of this project in the first place as it doesn't maintain whitespaces between parse & stringify either.
So, is there any appetite for doing this work?
I am fairly confident that @tracker1's suggestions around comment preservation are sound, and if you were to have any concern about backwards compat, I'd encourage adding parseWithComments
as a separate function, to avoid breaking any existing code.
(I have a stringify solution discussion https://github.com/toml-lang/toml/issues/854, which includes comment:)
Comment
Another pain is comment. Obviously we don't want a configuration file that contains comments lose all comment information after being modified by programs.
In JavaScript (I don't know whether it's possible for other languages),
[commentFor(key)]
as key in tables (this gives you a symbol as key, and the value should be the comment content string), so that the comment is after the value belong the key in the final serialization.const TOML = require('@ltd/j-toml'); const { commentFor } = TOML; TOML.stringify({ [commentFor('key')]: ' this is a key/value pair', key: 'value', dotted: { [commentFor('key')]: ' this is a dotted key/value pair', key: 'value', }, table: { [commentFor('header')]: ' this is a table header (but it cannot be a table in an array of tables)', header: Section({ }), }, });
key = 'value' # this is a key/value pair dotted.key = 'value' # this is a dotted key/value pair [table.header] # this is a table header (but it cannot be a table in an array of tables)
@LongTengDao I do like the appearance, for the manual access formatting... that said, that type of property declaration would wind up in the object model, and parse/stringify would get noisy going to/from JSON...
Perhaps...
const obj = TOML.parse(...);
TOML.commentBefore(obj, "dotted.key", "This is before the dotted key/value pair");
As a convenience access method... where the third parameter if set, sets the value... if it's only the first two, it gets the value... and to clear/remove a comment the third parameter is an empty string.
Alternatively...
const comment = TOML.commentBefore`${obj}.dotted.key`(); //get
TOML.commentBefore`${obj}.dotted.key`("new value..."); // set
TOML.commentBefore`${obj}.dotted.key`(""); // clear
Maybe just TOML.comment
and TOML.commentAfter
as the method names... wether using processing syntax or not.
In addition to ideas from #22 I wanted to make some specific suggestions about comments support.
preserveComments
when not truthy will exclude the___COMMENTS___
property altogetherenumerableComments
(See Object.defineProperty.) would allow for the___COMMENTS___
property to be enumerable. This would allow for preservation viaJSON.stringify
, default would befalse
.toml.stringify
would explicitly check for values and include them if non-whitespace comments are present.The property for comments would literally be
result.___COMMENTS___
; (three underscores before and after)The format for comments would be:
result,___COMMENTS___.${entry}.BEFORE
- entries and documentresult,___COMMENTS___.${entry}.INLINE
- only for entriesresult,___COMMENTS___.${entry}.AFTER
The entry key for the document is
result.__COMMENTS__.___DOCUMENT___
What would be returned, should be the same object structure as before with an optionally enumerable (false by default)
___COMMENTS___
property, three underscores, all caps.As mentioned in @scriptcoded 's comment Multiple comments together should be merged with
\n
.Edge cases... For multiple interspersed comments/lines, the first block with a blank line after, and no blank after the node will belong to the previous node's AFTER, all additional nodes will be part of the next node's BEFORE, The document nodes will be before/after as such.
Becomes:
NOTE: spaces after
#
preserved, no whitespace lines preserved except for the case of\n \n
in between comment blocks, but any whitespace on the original, otherwise blank lines are removed.With two
\n\n
together the output blank line stays blank... if there was a block between, even if no whitespace, should parse with\n \n
space between the\n
and will output#
with no space on the blank line, even if the original had one. This may not align with the original input, but is probably the best option for dealing with this edge case.Comment sections beginning with
\n
mean there are blank lines at the top of said block, similarly for the end, any dangling\n
indicate blank lines.becomes
block\n
whitespace on blank line removed, but dangling\n
is blank line.\n\n comment
becomes:
empty line(s) at start.
Carriage returns (
\r
) are dropped.The middle line (
#
) with whitespace, will result instart\n \n more
where the extra whitespace is dropped to a single space, similar for otherwise blank lines being replaced with nothing, but having extra\n