Will custom type syntax be good for TOML health?

toml-lang / toml

Tom's Obvious, Minimal Language

https://toml.io

MIT License

19.54k stars 855 forks source link

Will custom type syntax be good for TOML health? #603

Closed LongTengDao closed 1 year ago

LongTengDao commented 5 years ago

key = (compute) ' 5 * 60 * 60 '

key = (toSecond) ' 5h '

key = (toTable) [
  ['name', 'age', 'sex'],# head
  ['Jack', '10', 'male'],# item 1
  ['Max', '20', 'male'],# item 2
]

key = (toDOM) '''
  <div>
    <span></span>
  </div>
'''

[table] ('and other custom transform type')

I don't mean the custom type syntax is a replacement of standard types. I am just wondering, maybe the exploration of de facto standards, will facilitate the development of standard types, with less discussion which hard to decide, and avoid these requirement become a dialect which will conflict with spec in the future?

eksortso commented 1 year ago

@tintin10q You said:

You understand my view although I don't think encourage is the right word. Even if you would explicitly discourage abuse in the spec, the ability to do so would be there and that will go wrong at some point with people wanting to do 'clever things' with their parsers and then we get to You really should only have to look at the TOML spec to know what your TOML file will do..

That ability for abuse is still there, but any such abuse would make the abusing parser non-conformant. That's the most that we can do, really. If such abuse persists, then we could either adopt their changes into the standard or refuse to condone them, making appropriate modifications in either case.

We can make it more difficult for "clever" solutions to take root. If #950 gets merged, for instance, then parsers cannot mess with configurations by looking for and reading comments. So any "clever" solution would have to rely on non-standard syntax (like type tags) or unusual naming conventions or some such voodoo to do clever things, for better or worse.

and some consumers may read those comments and make changes to their configurations

With consumers do you mean a parser implementation or an application?

I meant post-parsing end-user applications when I said "consumers."

Let's stop repeating ourselves. I don't know what will happen with the tag discussions posed here. I had an idea which may be more confusing than it needs to be, but it may serve an important purpose. What if we took the parenthetical syntax, the words in round brackets like (min), and just treated them like inline comments? A line like timeout (min) = 3 would just assign 3 to timeout, but the user would be informed that the number assigned refers to a quantity in minutes.

"Clever" users might be tempted to write timeout (seconds) = 30, then complain when their requests take ten times longer to bail. So the inline-comment idea may be lousy. I think they'd have their purpose (and prevent overly long key names like timeout-minutes). But I think the idea may be worth exploring under a new suggestion issue.

tintin10q commented 1 year ago

I don't think timeout-minutes is overly long. I think timeout-minutes=3 makes more sense than the alternative that you propose timeout (min) = 3. The timeout-minutes is only 4 characters longer and one whole concept people have to learn about less. Also if you actually put the same unit name in the comment timeout (minutes) = 3 is actually longer than just having the unit in the value name. I think it is a good practice to put the unit of something in the name whenever possible. This way in the application will also know better what the unit is.

That's the most that we can do, really. If such abuse persists, then we could either adopt their changes into the standard or refuse to condone them, making appropriate modifications in either case.

I think the best option is just to ignore non-conformant and potentially learn from the ideas they came up with.

Inline comments by themselves might be a good idea. But I would not use another syntax for it with the (). One, because there is already a comment system so why not just extend that. 2 () are very often used for other things so I wouldn't use them for comments.

A better way to do inline comments is to just say that comments end when you encounter another #

So like this:

timeout # seconds # = 60
timeout #minutes# = 1

Although this does make parsing harder because now you have to keep track of when you are in a comment. I also think that timeout = 30 # seconds is equally as clear and doesn't require an extension to the language. But timeout-seconds = 30 is still better.

eksortso commented 1 year ago

Bracketing comments between hash signs is a non-starter because it will break any comment with a # inside it. This is why I proposed a different syntax, and I already acknowledged problems that could arise with that syntax.

I was trying to use a simple example to explain how a template writer could put units as comments after key names. There are more complicated key names than timeout-minutes after all, and my modest proposal (which I've decided not to make a PR for) doesn't prevent users from sticking unit names as suffixes onto key names.

LongTengDao commented 1 year ago

I have a meta question.

size (K) = 1 (M)

What should this get?

1_000_000_000
1_000

jeff-hykin commented 1 year ago

I have a meta question.
size (K) = 1 (M)
What should this get?

If this is asking for the output if the toml parser, even assuming tags were implemented, I would expect/hope that the output structure is still { "size": 1 }.

The point, or what I believe makes tags useful, is precisely that they don't change the structure. A number, that happens to be a unit of time, is still structurally a number (not a table, or a list) so if we want to keep the structure, but add the additional info of "minutes", that's where tags become relevant.

As is true for most current yaml parsers of docs with tags, the program still receives the plain/normal structure by default. For compatibility across toml parsers, it wouldn't make sense for toml to interpret the tags and manipulate the structure. If the program wants non-structural information whether it's tags or comments (for round-trip), it would make sense for that info to be a separate.

E.g.

doc = toml.parseDocument("thing.toml")
doc.data # { "size": 1 }
doc.tagForValue([ "size" ]) # "M"
doc.tagForKey(["size"]) # "K"

Without tags, two programs must "just know" timeout is in seconds. Tags don't change the fundamental need of interpretation, both programs still need to "just know" (e.g. coordinate) that "ms" means milliseconds and not microseconds. But, on top of being human-visible, the difference is that it's easier for two programs to coordinate on what a "ms" tag means compared to coordinating on the interpretation of every single timeout, delay, offset, start time, end time, etc.

What should this get?

So, if this is asking for the program output (instead of toml parser output), its like asking what units should the program get for { timeout = 300 }.

It just doesn't matter, the program could interpret the 300 as an enum value, or as 300 degrees kelvin, or the timeout value could be entirely ignored. Same for the (M) and the (K).

jeff-hykin commented 1 year ago

I think the real question is do the toml maintainers want to allow non-structural information?

If yes, then a human-readable syntax can be debated (and probably solved), and a write-with-tag method can be devised.

If no, then this issue should just be closed.

tintin10q commented 1 year ago

In my opinion allowing non-structural information is not a good idea and the issue should be closed.

Van: Jeff Hykin @.> Verzonden: zondag 22 januari 2023 18:02 Aan: toml-lang/toml @.> CC: tintin10q @.>; Mention @.> Onderwerp: Re: [toml-lang/toml] Will custom type syntax be good for TOML health? (#603)

I think the real question is do the toml maintainers want to allow non-structural information?

If yes, then a human-readable syntax can be debated (and probably solved), and an write-with-tag method can be devised.

If no, then this issue should just be closed.

— Reply to this email directly, view it on GitHubhttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftoml-lang%2Ftoml%2Fissues%2F603%23issuecomment-1399543778&data=05%7C01%7C%7Ca90512cf384f4732bb6508dafc9a6905%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638100037353373018%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1kus%2B8VfqZ7j6dMSlu0bMT%2Fm2LRKjUl7C0KRzPGEou8%3D&reserved=0, or unsubscribehttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFYR7ALD5Y2P5VD57PESJS3WTVRZLANCNFSM4G44UXOA&data=05%7C01%7C%7Ca90512cf384f4732bb6508dafc9a6905%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638100037353373018%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xss14O8GFHQ%2FD84RqjVeJ9F5HLC9ZgNEbnC1vtrW6D8%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

eksortso commented 1 year ago

In my opinion allowing non-structural information is not a good idea and the issue should be closed.

@tintin10q I don't entirely agree with your take on non-structural information; my reasons would take too long to explain succinctly here.

Bur with all due respect to @LongTengDao who opened this suggestion, we need to start fresh. Let's close this issue, and any of the various topics that we discussed here, if they're worth reintroducing, can be given better focus with new issues.

pradyunsg commented 1 year ago

I think the real question is do the toml maintainers want to allow non-structural information?

Based on reviewing the discussion here, I don't think tag-style rich information is a good idea. Quoting from the objectives of the language:

TOML is designed to map unambiguously to a hash table. TOML should be easy to parse into data structures in a wide variety of languages.

Neither of these are feasible with tag information. You need to either (a) modify the serialised data or (b) provide tag-like information via a side-channel. Both of thsoe are no-gos from my perspective.

size (K) = 1 (M)
What should this get?

An error? I think any behaviour other than an error here is going to be non-trivial to explain.

Let's close this issue, and any of the various topics that we discussed here, if they're worth reintroducing, can be given better focus with new issues.

I agree. If someone wants to pick out a specific piece from the discussions here, please open a new issue for that with a specific proposal for what you want to change (or at least specific usecases to focus on) so that we can have a less meandering discussion. :)

As always, thanks for a productive discussion here folks! Even though the conclusion here seems to be "no action, and more discussion", a lot of what has been discussed here is quite useful. :)