smallrye / smallrye-config

SmallRye Config - A Java Configuration library
Apache License 2.0
164 stars 118 forks source link

HOCON properties rendered incorrectly #1225

Open kdubb opened 1 month ago

kdubb commented 1 month ago

In HOCON, it's Path.render quotes anything it doesn't like, including things like % for profiles, [] for lists/arrays and / used in projects like SmallRye FaultTolerance.

Basically it produces a lot of quoted sections to the properties it produces that make them incorrect. Some examples:

There doesn't seem to be a way, currently, to represent these properties in HOCON files.

kdubb commented 1 month ago

I wrote an internal version of the module to quickly fix these issues for us. The solution is fairly simple with a few caveats because there is seemingly no way to "render" a key path in code consuming a config from the library; it always renders them itself before returning them.

So the fix looks like this:

  1. Generate flattened keys as is currently done.
  2. Use the ConfigImplUtil.parsePath to retrieve the path from flattened keys. a. Requires generating indexed paths with the [0] part quoted to be a valid HOCON path.
  3. Rebuild the path in a representation compliant with SmallRye Config. a. Quote any empty path elements or those having matches in [.\s\"] b. Join the elements using '.'

If I find the time I can look at making a PR for this but I'm pretty swamped at the moment.

radcortez commented 1 month ago

Thank you for the report.

I was considering deprecating and removing the HOCON source due to:

We have no plans to invest effort in an abandoned library unless it is super popular, which is not the case. For the time being, we do accept any PRs that improve the implementation, of course.

Would you consider moving away from the HOCON source?

kdubb commented 1 month ago

I've read those as we use HOCON. As the previous maintainer has said a few times, the library works with a few known issues but no real showstoppers. All in all, it's much better than YAML for config (in practice YAML trades repetition for lots of unnecessary indentations).

That being considered, I would say that if SmallRye Config's YAML support included breaking keys with dots in them into key "paths"... we'd switch to that in a hearbeat. That's really "the thing" that makes HOCON better.

So a YAML format that treated:

this.is.a.key.path: the-value

as if it were

this:
  is:
   a:
    key:
     path: the-value

I would see no value in HOCON.

kdubb commented 1 month ago

Not sure what SR Config is usign for YAML parsing but I know that being able to detect/specify quoated vs. unquoted keys is usually available (unless your using a meta-library like Jackson). So the above could be implemented and allow leaving quoted (single or double) keys alone.

radcortez commented 1 month ago

We use snakeyaml as the parse which handles that case.

But we do handle it differently for a few reasons: https://github.com/quarkusio/quarkus/issues/11744

kdubb commented 1 month ago

Yeah it's why we stopped using YAML in Quarkus, as I said you're trading repitition for indentation/newlines.

I'm wondering with we could add a "SimpleYAML" version to SmallRye and Quarkus. Where you get "simplified YAML" where common sense things like treating dots as path segments is done. It could be a configuration key but configuring your config source seems to make less sense than including quarkus-config-simple-yaml and getting these features.

dmlloyd commented 1 month ago

I think that pressing this idea as "common sense" distracts from the fact that doing this flattening would be incorrect, even if you hide it behind a mode. The reason is that a dot . is a valid part of a configuration segment, and so deciding that it instead acts as a configuration separator for syntactic convenience means that many segments are now non-representable. Since this is not a part of core YAML, there is not any escape mechanism to make this work correctly, so if we wanted to support this, we'd need to layer on another escape mechanism, which would lead to other problems.

To me, "common sense" means "correct".

kdubb commented 1 month ago

Ok, common sense might not be the correct terminology, but it certainly makes it simpler to use and achieves a configuration language that's much less wordy.

What's not representable if quoted keys are treated as a single segment?

dmlloyd commented 1 month ago

Any segment with a . in it (like a class name or logging category) would not be unambiguously representable.

kdubb commented 1 month ago

How, if it's quoted it's treated as a single segment, if it's unquoted, dots are segment separators. I'm not sure how that is unambiguous.

dmlloyd commented 1 month ago

Quoted how?

kdubb commented 1 month ago

The same way you quote keys to include unsupported characters (e.g :) currently.


"this.is.one.segment": value
'this.is.one.segment:including this': value
this.is.multiple.segments: value
# equivalent to
this:
  is:
    multiple:
      segments: value
dmlloyd commented 1 month ago

YAML makes no distinction between keys that are quoted and non-quoted by specification. This means it's generally not possible to distinguish these cases (i.e. whether or not a key was quoted) from a YAML-compliant parser like SnakeYAML. The solution is also incomplete, since things like foo."bar" would (rightly) not be considered to be valid YAML.

kdubb commented 1 month ago

You are arguing the specification of YAML and I am not. The spec clearly doesn't treat dotted keys as path segments. That doesn't mean it isn't a very useful thing to get the best of both worlds; removing the repetition of .properties files without the unnecessary indentation and newlines required in YAML.

Regardless of the spec, SnakeYAML and most other YAML parsers do distinquish between quoted and unquoted keys during parsing. Mostly because you generally want some way of specifying this during serialization and they provide the same information during parsing.

So quoted vs unquoted is a nice simple way to achieve all this.

With regard to foo."bar".baz.. that's a valid key in YAML (the quotes will be included) and it's easy to parse that as a single segment.

dmlloyd commented 1 month ago

The specification is relevant for two reasons.

First of all, because the specification treats these cases as equal (AFAICT), then interpreting the document differently is in violation of the spec. The possible ramifications of this include (but are not limited to) having config be broken because some YAML-processing tool reinterpreted keys and de-quoted things that do not, by spec, need to be quoted.

The second reason is purely practical: I don't see how we can possibly know whether or not a key was quoted or unquoted after we get it out of the parser. So if that is indeed the case, we'd either have to hack into the parser or write our own, neither of which seems very appealing.

kdubb commented 1 month ago

As I said previously, SnakeYAML surfaces parsed strings (including keys) as scalars with a ScalarStyle which includes DOUBLE_QUOTED, SINGLE_QUOTED, and PLAIN (amongst others).

No hacking needed.

radcortez commented 1 month ago

From what I could tell, the ScalarStyle only applies to the Dumper and not the Parser.

When I try to read foo.bar or "foo.bar", the resulting key does not contain the quotes, but maybe I'm missing something. Can you also have a look?

Additionally, if we were to support it, I believe it would break the existing configuration. How would you solve it?

dmlloyd commented 1 month ago

Even if we managed to solve the second problem, we still have the first one: that any random tooling might transform the file incompatibly because we're assuming things that are not promised in the specification. I wouldn't be opposed to some other quoting scheme - but such a scheme would have to be spec compliant at the very minimum.

kdubb commented 1 month ago

From what I could tell, the ScalarStyle only applies to the Dumper and not the Parser.

The Parser produces Events, ScalarStyle is a property on the ScalarEvent.

https://github.com/snakeyaml/snakeyaml-engine/blob/0579bdb74b5d48cf4470a85309e4da1f1593fea4/src/main/java/org/snakeyaml/engine/v2/events/ScalarEvent.java#L31

Even if we managed to solve the second problem, we still have the first one...

I admire your adherence to the standard, but I'm very much talking about "stretching it" 😆

This is why I suggested a completely different module, so that never the twain shall meet ( (e.g. SimpleYAML but now I like NotSoIndentyAndNewlinyYAML for the name).

The goal is to take small liberties with YAML, and use what the Parser/Dumper already provide to make the developer experience measurably better. Yep, we're doing things non-standard, but I can't see the drawback. Given that we're getting requests for keys from SR Config, not providing keys [1], we should be able to map whatever request comes in, to a "nicer to type" value.

Honestly, given the absolutely broken state of HOCON in SR Config; the idea of replacing it with a YAML that gives you nearly all the same convenience without relying on (technically) unmaintained libraries, seems like a real win.

The only drawback I see is if I load YAML into a "YAML Parser" I'm going to get keys like "this.is.a.key.path" instead of this: { is: { a: { key: { path: <value } } } }. So... don't do that and expect to get anything but key paths (cause the objects won't have been created at each segment. The quoting issue really doesn't bother me. this."is.a".key.path would show up exactly as you see it there. Anybody loading these YAML files should know to process them accordingly (or just use SR Config).

An actual nice name for it might be ConfigYAML.

[1] - I know that SR Config does report keys to show config (e.g. in Quarkus) but the actual resolution seems to only rely on making requests of the ConfigSources. I do know the importance of showing the keys correctly too. When trying to figure out why our HOCON files weren't working we were able to plainly see that there were keys that were misrepresented.

kdubb commented 1 month ago

Upon reflection, maybe ConfigYAML is too... something. I could see people getting them mixed up if there were both a quarkus-config-yaml and a quarkus-config-configyaml extension 🤷‍♂️... I still like calling it ConfigYAML though haha

kdubb commented 1 month ago

When I get some time, I will almost definitely implement this in our project, I've already got a HOCON config extension to get it working correctly.

Maybe the best course of action is to let me find the time to do that and report back. Once it's working we should learn pretty quickly if it causes any issues with editing, saving, etc. It wouldn't be a win if you're fighting with the IDE every you edit or save a "ConfigYAML" file.

kdubb commented 1 month ago

Back to the original topic of HOCON. I really think you guys should fix or pull it. I'd vote for fix. It's not a terribly hard fix and HOCON is an accepted configuration format, but I would understand if you chose to drop it altogether.

radcortez commented 1 month ago

The Parser produces Events, ScalarStyle is a property on the ScalarEvent.

Yes, but currently, from what I could tell, you would have to reimplement pieces of the parser to make it work. At least I couldn't find a way to get the quoted keys out of the box. I also looked into the snakeyaml issue tracker, and I couldn't find any related discussion. It seems it was never considered or required.

I understand the frustration, and we have had many users complain about the same thing in the past. I would like to improve this but don't see how to do it without a significant breaking change. Consider:

quarkus:
  log:
    category:
      io.quarkus.category:
        level: INFO

Translates to quarkus.log.category."io.quarkus.category".level. Where io.quarkus.category is a dynamic segment of the configuration (the logger). We know what composes the segment because it is quoted. If we change the behaviour to represent a multi-path, then it becomes quarkus.log.category.io.quarkus.category.level, and you would required to mandate io.quarkus.category in the yaml file to be quoted. It is all doable, but unfortunately, it is a significant breaking change.

radcortez commented 1 month ago

Back to the original topic of HOCON. I really think you guys should fix or pull it. I'd vote for fix. It's not a terribly hard fix and HOCON is an accepted configuration format, but I would understand if you chose to drop it altogether.

Feel free to submit a PR; I'm happy to review and merge it. Thank you!