lightbend / config

configuration library for JVM languages using HOCON files
https://lightbend.github.io/config/
6.15k stars 967 forks source link

An API for load/edit/save .conf and .json files #272

Closed havocp closed 9 years ago

havocp commented 9 years ago

Continuing the discussion from #149 , once we merge #271 we will have all of the information we need to save the file again in memory.

Some initial thoughts...

The currently-private API in https://github.com/typesafehub/config/blob/master/config/src/main/java/com/typesafe/config/impl/Tokens.java is pretty awful as a public API, with no real thought to clarity, namespacing, extensibility, or anything of that nature.

I also recently messed with SimpleConfigOrigin in #267 and discovered that it's quite a can of worms to change that thing. Not impossible but not much fun either. I'm also a little concerned about the memory footprint of hanging a bunch of tokens off of every ConfigOrigin, when most apps won't use the info at all. (A potential solution would be to drop the tokens at the first use of withFallback.)

But, I'm not sure we even need the tokens on the origin, because if we have a ConfigDocument kind of API, it could simply support getting the nodes at a given path.

What I don't know yet is what the ConfigDocument or ConfigNode API should look like, exactly.

It could be very limited, supporting only setting paths to values, parsing the doc into a config, and rendering:

interface ConfigDocument {
   // replaces tokens at path with new tokens representing the given value
   ConfigDocument setValue(String path, ConfigValue newValue);
   // parses tokens into a Config and returns it
   ConfigObject parse(ConfigParseOptions options);
   // render doc by rendering tokens as originally formatted
   String render();
}

Questions already arise:

Still, a minimal API like that is potentially good enough for load/edit/save ? If so we can go with YAGNI and not do a lot more to start ...

The thing I'm wondering whether we need, would be a ConfigNode kind of thing - perhaps slightly higher-level than tokens, or perhaps almost the same as our current set of tokens, I'm not sure - this would allow looking at substitution expressions, comments, and other stuff and manipulating them sort of like the DOM. This may be needed for something like #225, but it's a lot more API surface area to get wrong. You would also need an API like this to write a HOCON formatter, for example.

It could be worth tackling the basic "load, change some values, save" use-case before the general AST use-cases, but I don't know.

MSLilah commented 9 years ago

@havocp @cprice404 So here's what I'm thinking:

Step 1: We create a new ConfigDocument class. This class will have a constructor that takes an input to be parsed. It will then proceed to tokenize its input and produce a TokenIterator.

Step 2: Create a new ConfigNode class. This will likely be an abstract class, with various subclasses representing certain types of values. These classes will contain the relevant tokens so that they can be modified or replaced as necessary. Child nodes will be stored in an ArrayList to preserve ordering.

Step 3: Write a new Parser. This Parser will be less complex than the current Parser. It will take all the tokens and construct an AST. This will allow us to find the relevant nodes when we call the setValue() method.

In cases like the below:

foo: {
  bar: "bar"
}
foo.baz = "baz"

We would NOT parse baz into being a child node of foo, and instead would have a foo.baz node on the same level as foo. This will require a more sophisticated searching algorithm but I think we can do it.

Step 4: Write the setValue() method. This will find the relevant ConfigNode and replace the necessary token. We'll probably need a setValue() method on each subclass of ConfigNode, as a number of them will likely need to be handled differently.

Step 5: Write the render() method, which just traverses the tree, rendering all the tokens into a single string. Ordering should be preserved since child nodes will be stored in an ArrayList.

How does this sound? I'm thinking this would likely only support File at first. Also, when a path that hasn't been seen before is added, it should be rendered in the correct place by the tree traversal. I'm still unsure how we would handle the case in which the path appears more than once in the same file, though.

havocp commented 9 years ago

Here's what I think the "separation of concerns" could be:

Re: the steps you mention, some thoughts:

As far as how to proceed - I think it might work best to avoid modifying the existing parser (parser2) at first, though you could borrow liberally from it; instead, focus on Parser1 and defining ConfigNode (if you like, make it a class AbstractConfigNode in the impl package, and we'll factor out a public config.ConfigNode interface later, using the same pattern as ConfigValue/AbstractConfigValue). You could make yourself a new test file ConfigNodeTest or whatever to test Parser1 and ConfigNode.

If you start there, then we can write tests and work out what the kinds of ConfigNode are and generally know what's involved here. Reading through Parser.java as it exists, it looks quite hard to refactor to a Tokens=>ConfigNode parser, but maybe pretty easy to refactor so that it parses ConfigNode instead of parsing Tokens. That is, most of the "shape" of Parser.java looks to me like it's driven by the needs of generating ConfigValue. I could be wrong as always.

Remember that the file could be considered either JSON or HOCON, and if it's JSON we need to enforce JSON rules and also ensure we create valid JSON.

A thorny issue for ConfigNode/ConfigDocument, perhaps, is how to arrange a document like this so it's easy to edit a value using the API:

foo {
  bar = 10
  bar = 11 // bar duplicated
  baz = 12
}
foo.bar = 13 // bar duplicated again, but with the path
foo.baz = 14 // and baz duplicated
foo {
  bar = 15 // not again!
}

So from the standpoint of recreating this file, you probably need a tree that matches how it's written above... a node foo with three children bar, bar, baz, then a node foo.bar, then a node foo.baz, then a node foo with one child bar.

However, that syntax-retaining tree will be annoying from an editing standpoint, and the setValue on individual nodes could be quite misleading. The "right" way to set foo.bar=42 in a file could be one of the following:

This sounds more terrible than it is, since in practice values and files you edit by machine probably won't be that complicated. So a simple rule like "remove all nodes but the last one and change the last one's value" is likely fine. That could become a method like ConfigDocument replaceAll(String path, ConfigValue) ... we do need a ConfigNode.setValue to implement this I think, but in most cases people should prefer to use a document-level setter.

Implementation-wise, remember that we use an all-immutable convention. For trees, one implication of this is that nodes can't have a parent() (since it would change every time you changed any node in the tree).

MSLilah commented 9 years ago

@havocp Hmm, alright, that makes sense. So it sounds like maybe I should start by defining a ConfigNode class, and then writing Parser1, or whatever it ends up being called, going one type of ConfigNode at a time.

In terms of inserting into the tree, I'm thinking we should leave all the nodes but change the value of the last one to 42 to preserve the original text of the file as close as possible. However, I also think removing all nodes except the last one and then changing the last one's value to 42 sounds like it might work well.

cprice404 commented 9 years ago

For our use cases, I think we pretty much frown upon (or just outright don't support) cases where someone defines the same value multiple times in the same file. So from our perspective I'd love to see us end up in a state where modifying a value would effectively remove dupes from the file... but assuming we're not the only target audience for these changes :), we'd obviously be happy to defer to whatever algorithm you think is best, @havocp .

havocp commented 9 years ago

I think the document.replaceAll kind of method will effectively just be a convenience method - you could also iterate over the nodes by hand and do what you want - but we should try to make it conveniently do what we think people will want. I think I agree with removing dups and changing the last one.

Starting with parser1/ConfigNode and a test file for same sounds good to me.

MSLilah commented 9 years ago

@havocp Here's what I'm thinking for the ConfigNode implementation:

There will be an abstract AbstractConfigNode class from which all nodes inherit, and each individual token will be wrapped in a ConfigNode of some sort, and then put in a tree structure. There will be around 5 different types of ConfigNode (class names are not final).

Does this sound sane/like what you were thinking? Also, @cprice404 you might be interested in this

MSLilah commented 9 years ago

The ConfigNodeComplexValue class will do the heavy lifting of replacing settings based on a given path. It will also have a method, children(), which will return the list of children so the user can modify it how they want.

havocp commented 9 years ago

It sounds reasonable to me; I would mostly have naming nitpicks I think. I guess I'd also say, try it and maybe we'll have to change it after we see how it goes, but it sounds like a good first thing to try.

Naming thoughts (not the final word, just ideas):

Don't get too hung up on the names; we can easily search-and-replace later. The main reason I bring it up is that I think naming can be helpful for conceptual clarity.

Mostly I'd say just give it a shot and see how it turns out, it's hard for me anyway to predict this too much a priori.

MSLilah commented 9 years ago

@havocp So I'm getting to work now on indentation when adding to a ConfigDocument, and wanted to get your feedback on approaches. Here's my idea:

This has two phases: indentation calculation and indentation addition to a value node.

To calculate the indentation, we can loop through the list of children in the object, and check for newline characters. If there is no newline character, we add the new setting onto the same line. If there IS a newline character, we calculate the indentation on the last line of the object, and use that indentation, adding the setting on its own line.

We will then have to add the indentation to a value node. This should not be particularly difficult; we can have a new protected method on ConfigNodeComplexValue that will take in an indentation string and insert it after every newline it sees. This will ensure consistent indentation if the parsed value was a multi-line object or array.

The only problem I can see with this approach is the case in which we're inserting a multi-line value into a single-line object. So in a case like this:

a { b { c { } } }, if we wanted to insert

{
    e : 12
    f : 13
}

at path a.b.c.d, I'm not sure of a good way to handle that indentation. We COULD add a new-line to the object at c, but then the question becomes how much do we indent that? What if the map we want is contained entirely on a single line, but a parent map a couple levels up is indented?

Mostly I just wanted to run this approach by you, and see if you had any suggestions/feedback, or if you'd prefer a different approach.

Thanks!

havocp commented 9 years ago

Philosophically I think this can be a pretty simple best-effort algorithm - if the file is nicely-formatted already in a reasonable way, then we try to match that, but if the file is too creative then too bad. As long as we produce a valid file it's ok to be a little ugly in corner cases.

For a { b { c { } } } if a itself is indented, you could use the a indent to decide on the spaces-per-nesting-level, and then indent based on that ? like if a is indented two spaces and is nested once, then we have two-space indentation; so b should be indented 4 spaces, c indented 6 spaces, etc. ?

I don't mean to imply you'd reformat the existing stuff; just if you add fields inside c you'd use this to decide on their indentation?

Anyway I think what you describe sounds fine. Let's just do something simple that works in common cases (and always produces a valid file, even in the corner cases where it gets the indentation a little weird).

MSLilah commented 9 years ago

Sounds good! Thanks for the feedback as always!

havocp commented 9 years ago

I think we can declare victory on this issue and close it! There are some more APIs we want to do, but I'll make new issues for those (or if I don't, anyone is welcome to of course). Thanks so much for the work on this.