we-like-parsers / pegen

PEG parser generator for Python
https://we-like-parsers.github.io/pegen/
MIT License
150 stars 32 forks source link

Allow parsing deeply nested structures #92

Open MatthieuDartiailh opened 1 year ago

MatthieuDartiailh commented 1 year ago

In enaml (https://github.com/nucleic/enaml), one user reported an error stemming from a recursion error when trying to parse a deeply nested function (see below). The only solution I could think of would for pegen to make use of the trampoline pattern to avoid recursion through the use of generators. However, such a change would be highly invasive and would make the code harder to reason about.

Does anybody has an alternative option to suggest ?

def fn_opcua_get_tree(i_obj_tree):
    """ Get OPCUA nodes
    """
    global g_dict_cnx   # Allow global access

    if i_obj_tree is None or i_obj_tree.str_type != G_OPCUA_TYPE:
        l_dict_valid = {}

        if g_dict_cnx["client"] is not None:
            # List OPCUA root nodes (namespaces)
            l_list_root = [fn_opcua_get_node(
                    g_dict_cnx["client"].nodes.root
                )]

            # Debug
            #pprint.pprint(l_list_root)
            #pprint.pprint(l_list_root.keys())
            #pprint.pprint(l_list_root.items())
            #pprint.pprint(l_list_root['children'].items())
            #print(type(l_list_root))
            #print(type(l_list_root.keys()))
            #print(type(l_list_root.items()))

            #for loop_dict_1 in l_list_root:
                #print(type(loop_dict_1))
                #print(loop_dict_1)

            if True:
                # Create the tree from 'valid' nodes (up to 9 level deep)
                i_obj_tree = \
                    EnamlxTree(
                        str_type = G_OPCUA_TYPE,
                        list_root = [
                            EnamlxNode(
                                str_text = f"{loop_dict_1['name']}",
                                #            enum_COLUMN.Id,    enum_COLUMN.Cls,    enum_COLUMN.Type
                                list_data = [loop_dict_1['id'], loop_dict_1['cls'], loop_dict_1['type']],
                                list_node = [
                                    EnamlxNode(
                                        str_text = f"{loop_dict_2['name']}",
                                        list_data = [loop_dict_2['id'], loop_dict_2['cls'], loop_dict_2['type']],
                                        list_node = [
                                            EnamlxNode(
                                                str_text = f"{loop_dict_3['name']}",
                                                list_data = [loop_dict_3['id'], loop_dict_3['cls'], loop_dict_3['type']],
                                                list_node = [
                                                    EnamlxNode(
                                                        str_text = f"{loop_dict_4['name']}",
                                                        list_data = [loop_dict_4['id'], loop_dict_4['cls'], loop_dict_4['type']],
                                                        list_node = [
                                                            EnamlxNode(
                                                                str_text = f"{loop_dict_5['name']}",
                                                                list_data = [loop_dict_5['id'], loop_dict_5['cls'], loop_dict_5['type']],
                                                                list_node = [
                                                                    EnamlxNode(
                                                                        str_text = f"{loop_dict_6['name']}",
                                                                        list_data = [loop_dict_6['id'], loop_dict_6['cls'], loop_dict_6['type']],
                                                                        list_node = [
                                                                            EnamlxNode(
                                                                                str_text = f"{loop_dict_7['name']}",
                                                                                list_data = [loop_dict_7['id'], loop_dict_7['cls'], loop_dict_7['type']],
                                                                                list_node = [
                                                                                    EnamlxNode(
                                                                                        str_text = f"{loop_dict_8['name']}",
                                                                                        list_data = [loop_dict_8['id'], loop_dict_8['cls'], loop_dict_8['type']],
                                                                                        list_node = [
                                                                                            EnamlxNode(
                                                                                                str_text = f"{loop_dict_9['name']}",
                                                                                                list_data = [loop_dict_9['id'], loop_dict_9['cls'], loop_dict_9['type']],
                                                                                                list_node = [
                                                                                                ]
                                                                                            ) for loop_dict_9 in loop_dict_8['children']
                                                                                        ]
                                                                                    ) for loop_dict_8 in loop_dict_7['children']
                                                                                ]
                                                                            ) for loop_dict_7 in loop_dict_6['children']
                                                                        ]
                                                                    ) for loop_dict_6 in loop_dict_5['children']
                                                                ]
                                                            ) for loop_dict_5 in loop_dict_4['children']
                                                        ]
                                                    ) for loop_dict_4 in loop_dict_3['children']
                                                ]
                                            ) for loop_dict_3 in loop_dict_2['children']
                                        ]
                                    ) for loop_dict_2 in loop_dict_1['children']
                                ]
                            ) for loop_dict_1 in l_list_root
                        ]
                    )
            else:
                # Test tree
                i_obj_tree = \
                    EnamlxTree(
                        str_type = G_OPCUA_TYPE,
                        list_root = [
                            EnamlxNode(
                                str_text = "TEST",
                                list_data = ["", 0, None],
                                list_node = []
                                )
                        ])
        else:
            # Empty tree
            i_obj_tree = \
                EnamlxTree(
                    str_type = G_OPCUA_TYPE,
                    list_root = [
                        EnamlxNode(
                            str_text = "",
                            list_data = ["", "", ""],
                            list_node = []
                            )
                    ])

    return i_obj_tree
pablogsal commented 1 year ago

In the past we have made the generator (in C) output a stack VM:

https://github.com/pablogsal/cpython/tree/pegenvm-2

But that was made for performance and it didn't convince us because debugging it was very hard and performance was basically the same as recursion.

MatthieuDartiailh commented 1 year ago

In Python it would allow to workaround recursion but I agree it is no fun to debug. Guess I would have to re-write my parser in something else than Python... Thanks @pablogsal