Open blarghmatey opened 4 years ago
I like this idea a lot! Definitely a config flag (CLI_MERGE_DATA = True
?) to enable this as it's quite specific functionality, but can see many use-cases for this.
It might be worth looking at pulling in something like https://github.com/toumorokoshi/deepmerge as a dependency to handle that functionality. It's certainly possible to implement as part of PyInfra if the preference is to avoid adding new third-party dependencies.
@Fizzadar just curious what your thoughts are on bringing in a dependency to handle this or implementing it natively to reduce surface area. I'm happy to take a crack at it either way but figured I should get your take on it before spending the time so that it fits your philosophy and goals for the project.
@blarghmatey I think an external package is a good idea here, given potential complexity of this. Did you have any in mind?
I'd like to keep the dependency tree as small as possible, but this could by bypassed (if the downstream package itself has many dependencies) by specifying it as an extra requirement similar to yaml for Ansible inventory parsing (https://github.com/Fizzadar/pyinfra/blob/master/setup.py#L118).
I've been using Pydantic in multi-stack/environment Pulumi IaC projects.
It was a game changer for not only managing hierarchial deployment configurations but also validating and serializing multiple types of input/output in a strongly typed fashion.
The BaseSettings module, which now lives as a standalone pydantic_settings
library is excellent for populating configurations from various sources (Python, json, yaml, toml, env vars etc.).
We've even added custom setting sources to fetch outputs from IaC stacks. Settus, which adds capabilities to source configurations from Azure KeyVault, AWS Secrets Manager and more, demonstrates how easy it is to extend.
These would potentially solve most if not all of the challenges mentioned in this issue with ease – with tons of additional benefits.
When I started using Pyinfra I immediately adopted those for managing deployment configurations, serializing configurations used as input in various Pyinfra operations and even validating Fact outputs for the lack of type hints in 2.x – removing the need for isinstance() and assertions alike resulting in much cleaner code and fail-safe operations.
Pydantic has very few external dependencies (typing-extensions & annotated-types) but I don't know what you think of bringing in a dependency like that, but personally, I'm never looking back.
Would love to share my experiences or help out implementing something of this nature (optionally with Pydantic) if there's interest.
Describe the solution you'd like In the existing implementation of the data hierarchy, there is a good way for handling overriding of flat values, but for the case where you might want to use nested structures (e.g. dictionaries, lists, sets) it might be useful to allow for merging of those structures as you traverse the hierarchy. This could be made configurable using the config functionality so that it can be set on a per-module basis.
For instance, if I have a deploy where I would like to install and configure a MySQL database, I can set the default set of configuration options in
all.py
as a dictionary of key/value pairs. In astaging.py
file that targets hosts in a pre-production environment I can then specify a dictionary that overrides a subset of those values.This can currently be achieved by flattening the structure and specifying all of the keys at the top level and prefixing them with a namespace for the deploy, but using a more rich data structure it provides implicit namespacing. It also provides the advantage of being able to render out configuration files that are intended to be JSON or YAML by simply serializing the merged result of the data object and writing that as the contents of the file.