ssato / python-anyconfig

Python library provides common APIs to load and dump configuration files in various formats
MIT License
278 stars 31 forks source link

Backends are inconsistent with each other #73

Open nolar opened 7 years ago

nolar commented 7 years ago

Example:

Consider the following configs in YAML & XML, which are made friendly to the users (who create these configs). YAML is "canonical" and directly maps to the expected Python dicts.

config:
  db:
    type: redis
    host: 1.2.3.4
    port: 6379
  layer:
    silent: yes
    name: abc
environment:
  mode: static
  tasks:
    - RebootBeforeStart
    - MountLocalHDD
  xservers:
    - ip: 10.20.30.40
      username: root
  pservers:
    - ip: 20.30.40.50
      username:  user
<test-run>
  <config>
    <db type="redis" host="1.2.3.4" port="6379"/>
    <layer silent="yes">abc</layer>
  </config>
  <environment mode="static">
    <task>RebootBeforeStart</flag>
    <task>MountLocalHDD</flag>
    <xserver ip="10.20.30.40" username="root" />
    <pserver ip="20.30.40.50" username="user" />
  </environment>
</test-run>

The actual outcome:

Now, I want to use the unified parser with multi-file loading from a set of expected paths, and the configs can be at any of them. As a developer, I meet lots of inconsistencies caused by multiple formats per se, and with how they are interpreted by anyconfig:

The XML backend returns the structures which contains the meta-information about the structure of the configs, and thus breaks easy access to them. E.g.:

Presence of @attrs, @text & @children keys is basically not under my control, because as there are some conflicts in the XML files, it does not raise an invalid format error, but alters the data structures. Text contents are sometimes set as a string, but if there are attrs, they are in @text.

>>> anyconfig.load(xml_files, ignore_missing=True)
{'test-run': {'config': {'db': {'@attrs': {'host': '1.2.3.4', 'port': '6379', 'type': 'redis'}},
                         'layer': {'@attrs': {'silent': 'true'}, '@text': 'abc'}},
              'environment': {'@attrs': {'mode': 'static'},
                              '@children': [{'task': 'RebootBeforeStart'},
                                            {'task': 'MountLocalHDD'},
                                            {'xserver': {'@attrs': {'ip': '10.20.30.40', 'username': 'root'}}},
                                            {'pserver': {'@attrs': {'ip': '20.30.40.50', 'username': 'user'}}}]}}}

Even if I request merging of the attrs, it is still not predictable: instead of having attrs merged on the same level as @text, they are still stored in @attrs sub-dict.

>>> anyconfig.load(files, ignore_missing=True, merge_attrs=True)
{'test-run': {'config': {'db': {'host': '1.2.3.4', 'port': '6379', 'type': 'redis'},
                         'layer': {'@attrs': {'silent': 'true'}, '@text': 'abc'}},
              'environment': {'@attrs': {'mode': 'static'},
                              '@children': [{'task': 'RebootBeforeStart'},
                                            {'task': 'MountLocalHDD'},
                                            {'xserver': {'ip': '10.20.30.40', 'username': 'root'}},
                                            {'pserver': {'ip': '20.30.40.50', 'username': 'user'}}]}}}

Also, if there are some duplicated nodes in XML, the whole children list is made as a list, not only those duplicated nodes.

The expected outcome:

{'config': {'db': {'host': '1.2.3.4', 'port': '6379', 'type': 'redis'},
            'layer': {'silent': 'true', '@text': 'abc'}},
 'environment': {'mode': 'static',
                 'task': ['RebootBeforeStart',
                          'MountLocalHDD'],
                 'xserver': {'ip': '10.20.30.40', 'username': 'root'},
                 'pserver': {'ip': '20.30.40.50', 'username': 'user'},
                },
}

Or, for multiple <pserver> tags:

{'config': {'db': {'host': '1.2.3.4', 'port': '6379', 'type': 'redis'},
            'layer': {'silent': 'true', '@text': 'abc'}},
 'environment': {'mode': 'static',
                 'task': ['RebootBeforeStart',
                          'MountLocalHDD'],
                 'xserver': {'ip': '10.20.30.40', 'username': 'root'},
                 'pserver': [{'ip': '20.30.40.50', 'username': 'user'},
                             {'ip': '20.30.40.60', 'username': 'user'}],
                },
}

Problem:

  1. As a result of such significant differences, as a developer, I have to go through the whole config structure and merge the attrs/children/texts myself. Even if I use merge_attrs=True. Hence, I produce a lot of code for post-parsing, thus basically doing the parsing myself; the library just "read" the files in different formats, but it is not what was expected.

  2. If there are multiple configs in different formats with such structural differences, they become non-mergeable, or merge improperly (mostly because I have to do this afterwards, not during anyconfig processing & merging with the merge strategies).

Suggestions:

Change the logic of merging for the XML backend, that by default is most consistent with YAML output:

Notes:

In the examples, there are differences between singles in XML and plurals in YAML. This corresponds to the semantics of the XML & YAML formats. I believe, this cannot be handled automatically for obvious reasons. But, even cfg.environment.task & cfg.environment.tasks possible keys are easier to interpret where needed with some basic knowledge of the domain.

Also, there is an issue with config.layer.name (YAML) vs. config.layer.@text (XML), which also should be solved with some domain knowledge, and can be ignored for this problem with XML-vs-YAML inconsistencies.

nolar commented 7 years ago

This issue is similar to #62 regarding inconsistencies of the parsed outcome, but there is a different outcome structure is suggested, which also contains meta-information on the original config (attrs vs. children vs. texts), which is not needed.

ssato commented 7 years ago

@nolar Thanks a lot for you suggestion and detailed explanation!

Honestly, I don't think it's possible to make YAML/JSON/... and XML backends become consistent always like you said. These are very different formats for different purposes, especially XML is and I do not want to force converting XML to YAML/JSON/... w/ side effects such as lost of structure and semantic information hidden in the original XML by default (options may be OK). For example, even names of attrs and children nodes look same, these represent different kind of things in general from my experiences.

And I'm sorry but I think that your XML snippet is not good example to discuss this because it's does not look canonical and consistent w/ YAML one; maybe it should be like 'xml_s_2' in the following:

In [1]: import anyconfig

In [2]: xml_s = "<a x='X'><b>0</b><c>1</c></a>"

In [3]: anyconfig.loads(xml_s, ac_parser="xml", merge_attrs=True)
Out[3]: {'a': {'b': '0', 'c': '1', 'x': 'X'}}   # merge_attrs worked as expected in this case.

In [4]: xml_s_2 = """<test-run>
   ...:   <config>
   ...:     <db type="redis" host="1.2.3.4" port="6379"/>
   ...:     <layer silent="yes">abc</layer><!-- I have no idea it should be like, but attrs and text node should not be mixed if these represent objects of same kind. -->
   ...:   </config>
   ...:   <environment mode="static"><!-- Likewise, 'mode' should not be an attr but It's OK in this case because anyconfig can process it in a same manner w/ other children nodes if merge_attrs == True -->
   ...:     <tasks> <!-- A list element node is necessary if elements are list. -->
   ...:       <task>RebootBeforeStart</task> 
   ...:       <task>MountLocalHDD</task>
   ...:     </tasks>
   ...:     <xserver ip="10.20.30.40" username="root" /><!-- A list element node is required to make the structure same as YAML example like <xservers><xserver ip=.../>...</xservers>. -->
   ...:     <pserver ip="20.30.40.50" username="user" /> <!-- Likewise. -->
   ...:   </environment>
   ...: </test-run>
   ...: """

In [5]: anyconfig.loads(xml_s_2, ac_parser="xml", merge_attrs=True)
Out[5]:
{'test-run': {'config': {'db': {'host': '1.2.3.4',
    'port': '6379',
    'type': 'redis'},
   'layer': {'@attrs': {'silent': 'yes'}, '@text': 'abc'}},
  'environment': {'mode': 'static',                      # merge_attrs=True worked here.
   'pserver': {'ip': '20.30.40.50', 'username': 'user'},
   'tasks': [{'task': 'RebootBeforeStart'}, {'task': 'MountLocalHDD'}],
   'xserver': {'ip': '10.20.30.40', 'username': 'root'}}}}

In [6]: xml_s_1 = "<a x='X'><b>0</b><b>1</b></a>"

In [7]: anyconfig.loads(xml_s_1, ac_parser="xml", merge_attrs=True)
Out[7]: {'a': {'@attrs': {'x': 'X'}, '@children': [{'b': '0'}, {'b': '1'}]}}  # These cannot be merged.

In [8]: