create_unique sometimes causes excessive recursion

szvsw commented 2 years ago

https://github.com/samuelduchesne/archetypal/blob/d0aed03ebbf015347d05f02fbbe746adfae99038/archetypal/template/umi_base.py#L536

This recursion call sometimes raises a RecursionError: maximum recursion depth exceeded while calling a Python object

I have not yet been able to reliably repro - error is happening in umiverse flask backend.

Sometimes the following steps execute successfully, sometimes, the final step fails with an recursion error.

Fetch records ["MediumOffice_Post1980_2A", "MediumOffice_New2004_2A", "MediumOffice_Pre1980_2A"] from MongoDB (though I have not yet found a single template or collection of templates which always causes the error - my guess is it has to do with the order that the templates arrive as results from the db, if that is not the same every time?)
construct the templates with bld.to_template(...)
construct the template_lib with UmiTemplateLibrary(BuildingTemplates=[<templates>])
run template_lib.unique_components()
convert to dict with template_lib.to_dict()

When it fails, this stack trace occurs:

Traceback (most   recent call last):
--
File "/usr/src/app/project/routes.py", line 1104, in   construct_umi_file_task
template_lib_as_dict = template_lib.to_dict()  # as dict
File   "/usr/local/lib/python3.8/site-packages/archetypal/umi_template.py",   line 579, in to_dict
data.update({"Name":   UniqueName(data.get("Name"))})
File   "/usr/local/lib/python3.8/site-packages/archetypal/template/umi_base.py",   line 511, in __new__
return str.__new__(cls, cls.create_unique(content))
File   "/usr/local/lib/python3.8/site-packages/archetypal/template/umi_base.py",   line 536, in create_unique
return cls.create_unique(name)
File   "/usr/local/lib/python3.8/site-packages/archetypal/template/umi_base.py",   line 536, in create_unique
return cls.create_unique(name)
File   "/usr/local/lib/python3.8/site-packages/archetypal/template/umi_base.py",   line 536, in create_unique
return cls.create_unique(name)
[Previous line repeated 934 more times]
File   "/usr/local/lib/python3.8/site-packages/archetypal/template/umi_base.py",   line 528, in create_unique
match = re.match(r"^(.*?)(\D*)(\d+)$", name)
File "/usr/local/lib/python3.8/re.py", line 191, in match
return _compile(pattern, flags).match(string)
File "/usr/local/lib/python3.8/re.py", line 291, in _compile
if isinstance(flags, RegexFlag):
RecursionError: maximum recursion depth   exceeded while calling a Python object

Seems like there is some sort of issue with the regex matching when it shouldn't, causing the recurse call to keep happening.

It's possible to increase the python recursion limit (default is 1k I believe) if it is actually possible for the count to go past 1k, but that seems like it shouldn't be necessary, and anyways doesn't seem like the most performant solution if it is. I wonder if it makes sense to use a method like keeping track of the count for each base name, and then just incrementing that, rather than trying to determine the current count with the regex and then recursing.

szvsw commented 2 years ago

Here's a little code sample of what I am proposing:

I will try to get around to testing it today/tomorrow

def create_unique(cls, name):
    """Check if name has already been used.

    If so, try to increment until not used.

    Args:
        name:
    """
    if not name:
        return None
    if name not in cls.existing:
        cls.existing[name] = 0
        return name
    else:
        current_count = cls.existing[name]
        cls.existing[name] = current_count+1
        return f'${name}_${current_count+1}'

samuelduchesne commented 2 years ago

It probably means the regex always matches and a new incremental name is created.

I like what you are suggesting above. Short and sweet. Any luck with testing? Testing is essentially checking that the tUmiTemplateEditor can read a template library produced by archetypal. Unfortunately, there isn't an official schema we can test against because the UmiTemplateLibrary does not follow a regular schema.

This is somewhat a side-conversation but I was wondering if you had any thoughts as to how to improve the format of the template library. It looks like it is an ORM serialized into different lists.

szvsw commented 2 years ago

This is somewhat a side-conversation but I was wondering if you had any thoughts as to how to improve the format of the template library. It looks like it is an ORM serialized into different lists.

Do you mean the format at the mongo db level? Or in archetypal/pyumi?

Just took a quick look at the schema in UmiTemplateDB ( `mongodb_schema ) and it's implemented how I would have done it in a vacuum.

From our zoom on tuesday, I gather that main "issue" we are wanting to work around is this notion that all of the entities associated with a template (in the ubem.io context at least) should be unique down to the lowest leaf level once its compiled into an actual object, so that when the user edits a setting, it does not propagate out to other entities - which ultimately kind of obviates the utility of the relational db structure implemented by UmiTemplateDB. The other issue is thinking about how to (maybe) make fetching data from the db more efficient instead of having to crawl through lots of relations - though to me, this doesn't seem like such a bad problem.

One possibility might be to just add a new raw table in the DB which stores raw templates, i.e. a fully unique template which gets dumped to a blob/json and associated with a name/some metadata (e.g. CZ, program, yearstart/end, structural type, etc). essentially the same as what you were suggesting with storing them in an S3 bucket, just in the db so that some extra metadata can be associated with them - however I suppose that metadata could probably just as easily live in blob itself. A separate table which represents a library could then have two main list fields - one which has identifiers for templates as they are currently stored (i.e. as relational objects with lots of subcomponents that will need to be fetched), and one which has identifiers for raw templates. So each library (e.g. the CZ 2A library or the Restaurants library or the All library), you would then get some combination of blobs and/or relational objects which still need to have all of their sub objects acquired. That way if you want a library to just be made up of fully defined raw blobs which should be quick to fetch, you can grab that, or if you want a library to be made up of relational objects with subcomponents you can still do that, or you can have a mix.

Maybe this is not the question you were asking, if not let me know more what you were thinking about it trying to resolve and I will have a think on it.

samuelduchesne commented 2 years ago

Yeah I was more thinking in terms of the structure of umi templates. Maybe we are stuck with the ORM based approach. It is just not meant to be serialized as json, if I am not mistaken; I have to make some real funky acrobatics to parse umi libraries in python.

Btw, feel free to create a pull request if you think your fix solves the recursion issue.

samuelduchesne / archetypal

create_unique sometimes causes excessive recursion #337