Closed salotz closed 1 year ago
FrozenConfiguration.from_dict
converts dict
s to FrozenConfiguration
objects because it is the easiest way to get immutable mappings.
If you are just interested in conveniently writing these to YAML, you can just use upsilonconf.save_config
with a filename that ends with ".yaml"
or ".yml"
.
This being said, the main problem is that upsilonconf does not really support sequences of configurations at this point. The main reason is because I want to reserve them for representing configuration grids (e.g. for hyper-parameter search). However, if you have a specific need for lists of configs, this is something I am open for and can look into. If this is the case, I would appreciate it if you could start a new issue and explain your use case for sequences of sub-configs.
PS: To be precise, this is a particular side-effect of FrozenConfiguration.from_dict
.
If you would use PlainConfiguration.from_dict
, you'd find that the dict
s in your list are not recursively converted (because a list
of dict
is not considered part of the configuration currently).
I see, kind of. In my use case they would be YAML files written by hand.
The main reason is because I want to reserve them for representing configuration grids (e.g. for hyper-parameter search).
IMO that is a feature that should be built on a lower level general purpose config system (which I thought was the goal of upsilonconf and one reason I'm not using e.g. hydra on top of OmegaConf). And I can use another hyperparameter matrix tool for that like ml_collections. E.g. a special schema of a config file that that can specify matrices of parameters that can then be used to generate specific full configs.
So that said I'm not sure its another issue, except that it competes with the other goals you've mentioned here.
my use case they would be YAML files written by hand.
Can you be a bit more specific about what you want to achieve by using lists?
After all, even if the config is written by hand, it would be possible to just add a key for each list item (e.g. item0
, item1
, ...).
Do you wish to represent configuration options (similar to what we need for hyper-parameter searches), is it something like *args
vs **kwargs
or is it something entirely different?
E.g. a special schema of a config file that can specify matrices of parameters
It seems that you are interested in using lists to represent different options for configurations. Is this correct?
that is a feature that should be built on a lower level general purpose config system
You are probably right. Even in the case of hyper-parameter search, the representation of configuration options should probably happen in this library.
I'm not sure it's another issue
You are right (again). I completely missed the title and thought this was about saving configuration objects (rather than full support for list
values).
Given my current understanding of your issue, I think the following tweaks should solve the problem:
Options
class to represent a list
containing different value options for a parameter.ConfigurationGrid
class that allows to index and iterate the different configurations (with one of the options filled out).to_grid
method that allows to convert a configuration object with Options
values to be represented as a grid.Is this something you could work with?
I should clarify as I realized I seemed to be contradictory in my comments. What I am doing is round-tripping the configuration. A user might write the below config but I want to output at runtime a copy of the final resolved configuration which may contain defaults, application version, etc. That is where I run into this issue.
After all, even if the config is written by hand, it would be possible to just add a key for each list item (e.g. item0, item1, ...).
This is what I am doing currently. Basically in the case where you wouldn't be able to give these sections keyed names, or they are inherently anonymous things. For an example you are parametrizing something that needs a list of things but doesn't need names for them:
rocks:
- size: 10
weight: 30
- size: 3
weight: 5
I don't care what the name of the rocks are but I do need to allow for the user to specify an arbitrary number of them.
I would expect behavior from to_dict
to output the following (or another method -- e.g. destructure
-- if you want to keep this one within a particular use case although I think it would be unnecessarily confusing):
{"rocks" : [{"size" : 10, "weight" : 30}, {"size" : 3, "weight" : 5}]}
So currently the workaround for my particular use case isn't so bad to name each "rock" but I think its something that should definitely be supported as it seems like an unnecessary constraint on the ergonomics of want to provide to my users.
Given my current understanding of your issue, I think the following tweaks should solve the problem:
Pardon me if I'm misinterpreting or misjudging the purpose of the library, but I think you are overfitting the concepts in the solution to that of parameter searches still. My list items are not "options" they are just lists of things. An "option" could be an interpretation, but ultimately higher level. I'm thinking of something much simpler that is just converting to and from raw "dicts and lists" to "Configuration" objects where the "Configuration" objects have support for merging (the access syntax is nice too I guess). This is all I really care about. I'm doing the YAML parsing/writing myself (for subtle issues like handling scientific notation properly) and I'll be handling the structuring of the values to real Python types (e.g. https://catt.rs/en/stable/index.html).
That said the hyper-parameter framework (like hydra) is something I am interested in. I've used hydra before and my only real complaint is the ANTLR dependency, as I often have to build everything from source and having a JDK in the mix makes that really annoying, especially for what should ultimately be a very simple pure-python library. I'd be happy to use something that replaces it built on upsilon, but not what I'm interested in here. I see the Upsilon/Omegaconf layer as a very nice unbloated unix-y tool that can be used in so many other places!
I also would be willing to contribute as time allows and if anything becomes a real blocker. Especially if the vision in the docs is maintained as I would probably end up writing my own version of it as there is currently nothing filling the niche.
Hopefully that isn't too much unsolicited advice
Thank you for the additional information. That does make things a little clearer.
Just to make sure I understand you correctly, are the following statements correct?
from_dict
, to_dict
.Also, would you be able to answer the following questions:
dict
s inside the list
be represented as configuration objects?This final question probably requires some background information.
Configuration objects are designed to conveniently represent kwargs
of python functions.
After all, kwargs
are typically used to "configure" functions in python.
Given your example of the rocks config, I would assume that there is some function of the form def func(x, y, z, rocks)
and this function can be called using func(x, y, z, rocks=config.rocks)
or func(x, y, z, **config)
.
Is the goal to call some sub_func
with the different configurations in the list or is the body of func
agnostic of the kind of elements that are passed through the rocks
kwarg?
Sorry for the interrogative tone, but I just want to make sure that I understand you and your use-case correctly.
PS: I just realised that round-trips on mutable objects inside of an immutable configuration object would require to keep track of the original (mutable) types in the configuration. I am not sure if that kind of overhead is reasonable. What do you think?
Sorry for the interrogative tone, but I just want to make sure that I understand you and your use-case correctly.
No worries! Happy to answer any questions.
Just to make sure I understand you correctly, are the following statements correct? the items in the list don't represent configuration options, you don't need any special operations on these list values, you do need an immutable configuration object. you do need round-trip functionality for from_dict, to_dict.
Yes to all.
should the dicts inside the list be represented as configuration objects?
Yes
how do you plan to use these items in your code?
A simple but illustrative example:
rocks = RockArray([Rock(**rock) for rock in config.rocks])
In the large and for converting to structured objects I would use a system like cattrs (linked above), which kind of leads into the background info.
This final question probably requires some background information. Configuration objects are designed to conveniently represent kwargs of python functions. After all, kwargs are typically used to "configure" functions in python.
You also have the *args
you can pass as well.
Given your example of the rocks config, I would assume that there is some function of the form def func(x, y, z, rocks) and this function can be called using func(x, y, z, rocks=config.rocks) or func(x, y, z, **config). Is the goal to call some sub_func with the different configurations in the list or is the body of func agnostic of the kind of elements that are passed through the rocks kwarg?
I wouldn't do this directly. There would be two possibilities:
For instance:
rocks = config.to_dict()["rocks"]
# would look like this, not list of Configurations
# [{"size" : 10, "weight" : 30}, {"size" : 3, "weight" : 5}]
def dig_rock(size, weight):
...
def dig_rocks(rocks)
for rock in rocks:
dig_rock(**rock)
dig_rocks(rocks)
I think as your design goals states I also think its a bad idea to have the Configuration objects passed into anything other than application entrypoint code. Unless you are at a leaf of the configuration that has no sub-structure, in which case its fine to func(**config["rocks"][0])
.
I think to summarize is that I think it should kind of be left up to the user of the library to figure out how to do the slicing/structuring of the data in the configuration object. I'm picking up that you were kind of intending to use dict and tuple unpacking as kind of an all-purpose structuring tool, however I don't think that works at scale since it doesn't work recursively itself. That is why I use a tool like cattrs which is specially built to handle this difficult issue.
For instance consider I want to change my weight attributes to unitted values that needs to be specially parsed:
rocks:
- size: 10
weight: 30 kg
- size: 3
weight: 5 kg
I would need to change to something like:
rocks = [{"size" : rock["size"], "weight" : parse_quantity(rock["weight"]) for rock in config.to_dict()["rocks"]
And this complexity just keeps on scaling linearly with every new config option you add.
My only point is that cleanly handling deeply nested serialized structures like configs is probably something beyond the scope of (in my mind) a simple config merging library. The only general thing you can do is to just recursively convert to lists and dicts and if the user's use case is suitable for tuple/dict unpacking that should flow naturally from that.
PS: I just realised that round-trips on mutable objects inside of an immutable configuration object would require to keep track of the original (mutable) types in the configuration. I am not sure if that kind of overhead is reasonable. What do you think?
Personally I would just rip out anything mutable at all from the library. Its an antipattern and as you are mentioning it makes useful functional things impossible. I plan on never using anything other than FrozenConfiguration.
So I would just consider input "lists" (or tuples) as snapshotted at the point of config creation and forget the original thing existed ever.
An additional example with using cattrs for structuring and my ideal setup:
from attrs import define
@define
class Rock:
size: int
weight: int
@define
class RockArray:
rocks: list[Rock]
rocks = CONVERTER.structure(RockArray, config.to_dict())
Where CONVERTER
is a potentially customized cattrs converter, which can handle things like parsing special fields like in the units example above.
And then for destructuring you can do to plain lists and dicts:
rock_d = CONVERTER.destructure(rocks)
Or to things like serialization formats where you may want to handle format specific things like binary encodings etc.:
rock_json = JSON_CONVERTER.destructure(rocks)
Not trying to sell you on it necessarily but I think it might be the missing piece in the conceptual puzzle of what the boundaries of the config library can and should be, as it relates to this discussion.
I think I understand what you want to achieve, but I still don't quite understand how this is related to configuration or why you would need a configuration class.
The use case you illustrate seems to be more about how to convert user input to structured data (which seems to be a problem that is addressed by cattrs
) than about the configuration of a program.
I still fail to see why it could be a good idea to have a list of configurations as a configuration value unless they are just different sub-configuration options.
Given your example, I would expect something like
with open("config.json") as fp:
rocks = CONVERTER.structure(RockArray, json.load(fp))
to do exactly what you want. I do not see the benefit of using an intermediary configuration class in this setting.
I would just rip out anything mutable at all from the library
Making collections in python immutable comes at a certain cost: all elements in the collection must be immutable as well. If you would create the frozen configuration from immutable objects, round-trips are no problem, but since you create the frozen configuration from mutable objects, the objects need to be converted and the round-trip becomes impossible (without significant overhead).
I think I understand what you want to achieve, but I still don't quite understand how this is related to configuration or why you would need a configuration class.
I see your point. The main thing I use the configuration class for is the merging operations.
So:
with open("config.json") as fp:
config = FrozenConfiguration.from_dict(json.load(fp))
final_config = config | defaults
rocks = CONVERTER.structure(RockArray, final_config.to_dict())
Making collections in python immutable comes at a certain cost: all elements in the collection must be immutable as well. If you would create the frozen configuration from immutable objects, round-trips are no problem, but since you create the frozen configuration from mutable objects, the objects need to be converted and the round-trip becomes impossible (without significant overhead).
I'm fine with it being structurally the same. I.e. if you input a tuple from the config and output a list that is fine with me, but I can see why you wouldn't want to make that choice.
Since Python 3.9, you can merge dictionaries in python natively (PEP-584).
If that is not an option for you, I would advise you to use PlainConfiguration
, because this will not interfere with the incoming types and therefore natively enables round-trips.
I hope this anticlimactic answer works for you.
PS: feel free to fork this project to build an input parser if that is what you need, but I'm afraid that this is not what I had in mind when building upsilonconf.
Seems obvious in hindsight, as I hadn't actually used this in practice as of yet...
I would like to have richer operations than just what is in the PEP though, which I was thinking of adding additional feature requests. For instance merge([config_a, config_b, config_c])
and subtree merge.
Fair enough though thanks for the conversation and sorry for taking so long to get there.
Currently
from_dict
works to recursively go down through lists to get subFrozenConfiguration
objects. By example:And then I cannot serialize this to e.g. YAML simply.