ssato / python-anyconfig

Python library provides common APIs to load and dump configuration files in various formats
MIT License
277 stars 31 forks source link

Remove annoying to_container argument #66

Closed ssato closed 7 years ago

ssato commented 7 years ago

To reduce strong dependency to m9dicts, remove to_container arguments and make it works only with standard library as much as possible.

ssato commented 7 years ago

Partially improve by the commit b89757d.

nolar commented 7 years ago

This may damage the flexibility of the config parsing. Here is my sample use-case:

I want multiple configs, including YAML, XML, INI configs to be flattened to a unified data structure, with the keys & values (and sometimes lists). The XML parser returns those @text, @attrs, @children keys instead of the desired values. Consider this config:

<config>
    <layer mode="somemode">kvm</layer>
    <env mode="static">
        <server ip="1.2.3.4"/>
        <server ip="2.3.4.5"/>
    </env>
</config>

The resulting structure cannot be flattened by merge_dicts=True option, because there are both texts, attrs & children, and the existing implementation explicitly avoid flattening in this case.

So I made my own dict-like object to flatten all keys (in __setitem__), so that the attrs & the children are merged as equal. The text & attrs case is, however, undecided how to handle, but probably the text part will be mapped to some predefined key. The desired result is:

{
  'config': {
    'layer': {
      'mode': 'somemode',
      'name': 'kvm',   # <-- text content
    },
    'env': {
      'mode': 'static',
      'pserver': [
         ...
      ],
    },
  }
}

How am I supposed to handle that if to_container is removed? Should I just post-process the results, assuming they have came from different sources (YAML vs XML), just by looking at the fields names.

E.g., if I see @attrs, then flatten them to the parent dict. In that case, the precedence of multiple config files can be violated (e.g., a low-priority file has attr <env mode=static></env> set, while the high-priority overrides it with a tag <env><mode>static</mode></env> — but we notice the @attrs and apply a low-priority value on top of the high priority value).

The self-made container could solve this problem by applying the changes on parse time of each single file.

PS: Also, to_container is currently broken and ignored, so it does not work. I've came here to report this, but I see that the option is going to be removed at all.

ssato commented 7 years ago

Thanks a lot for your comments!

Actually, what you pointed at may be related to what I want to resolve in this issue, I guess.

I guess that especially the last one (ac_container/ac_dict) might help resolving your issue.

ssato commented 7 years ago

The XML parser returns those @text, @attrs, @children keys instead of the desired values.

This is intended because I think there is no way to implement mode does flatten them regardless of these types in XML backend generally. For example, I don't think there is a way to flatten the XML snippet in a consistent manner such like this:

<layer mode="somemode">kvm <options nested='true'/></layer>

So, IMHO, you have to modify/convert the nested dict loaded from configuration files anyway. Maybe query with ac_query (JMESPath expression) might help doing such work.

nolar commented 7 years ago

This is why I started to invent my own to_container class, which flattens the data according to my specific (non-generalised) case. This container delegates all @attr & @children assignments to the parent container (taking into account that you create 2 distinct instances in the anyconfig code). The conflicting @text case is ignored for now (solved with merge_attrs=True for my case). Something like this:

import collections
class XMLContainer(collections.MutableMapping):

    def __init__(self, *args, **kwargs):
        super(XMLContainer, self).__init__()
        self._data = {}
        self._parent = None

    def __setitem__(self, key, val):
        if key in {'@text', '@attrs', '@children'} and isinstance(val, XMLContainer):
            val._parent = self
            # NB: do not set it locally. assume it will delegate all sets to us while it exists.
        elif self._parent is not None:
            self._parent[key] = val
        else:
            self._data[key] = val

    def __delitem__(self, key):
        if key in {'@text', '@attrs', '@children'}:
            pass
        else:
            del self._data[key]

    def __getitem__(self, key):
        if key in {'@text', '@attrs', '@children'}:
            return self
        else:
            return self._data[key]

    def __iter__(self):
        return iter(self._data)

    def __len__(self):
        return len(self._data)
ssato commented 7 years ago

@nolar I've just released 0.9.0 which removes dependency to m9dicts along with maybe-correct implementation to allow passing user-defined mapping class to backend parsers directly to make results on load data. I guess that now you can accomplish what you want by passing XMLContainer as ac_dict keyword option to load APIs.

I'll close this issue but if you have any problem related, could you please open another issue for that?

ssato commented 7 years ago

@nolar BTW, did you try keyword options available in XML backend such like ac_parse_value=True, merge_attrs=True, etc? (Maybe I should implement an option to eliminate '@children' as much as possible also...)

nolar commented 7 years ago

@ssato It made it a bit easier to work with, but did not solve my problems. I therefore created a new separate issue for my problems: #73