yaml / pyyaml

Canonical source repository for PyYAML
MIT License
2.47k stars 507 forks source link

Depth first firing of generators #785

Open expl-info opened 4 months ago

expl-info commented 4 months ago

Currently, the construct_document() fires generators width-first:

    def construct_document(self, node):
        data = self.construct_object(node)
        while self.state_generators:
            state_generators = self.state_generators
            self.state_generators = []
            for generator in state_generators:
                for dummy in generator:
                    pass
        self.constructed_objects = {}
        self.recursive_objects = {}
        self.deep_construct = False
        return data

But it may be better to fire the generators depth-first (my implementation):

    def construct_document(self, node):
        def trigger_generators():
            generators = self.state_generators
            self.state_generators = []
            for generator in generators:
                for dummy in generator:
                    pass
                trigger_generators()

        data = self.construct_object(node)
        trigger_generators()
        self.constructed_objects = {}
        self.recursive_objects = {}
        self.deep_construct = False
        return data

The background on this is a constructor I was writing which produces a generator. The constructor performs a deep merge of a collection of nodes. Some of those nodes are produced by one or more standard "merge" ("<<:") operations. The width-first (default) approach does not produce the correct results, but the depth-first approach does.

My take is that the width-first approach does not resolve the deep node references in a timely way which my generator-based deep merge constructor handles. For example, timeliness is a problem when the initial {} is returned for the merge mapping node data, rather than the intended/generated content.

Running the test suite produces no errors when using the depth-first approach.

Any feedback on this issue is appreciated. In particular, does anyone think that the depth-first approach would not work or is problematic in some way?

nitzmahone commented 4 months ago

I haven't given it a lot of thought, but at a glance, that way seems more likely to hit recursion depth issues on deep documents. A lot of that code is really old and there are probably better ways to do it with more modern generator constructs that would still get what you're after without significantly affecting the callstack depth, but I'd have to think about it some more.