linkml / linkml

Linked Open Data Modeling Language
https://linkml.io/linkml
Other
311 stars 96 forks source link

gen-doc has bad runtime characteristics #1214

Open jankatins opened 1 year ago

jankatins commented 1 year ago

Describe the bug

I've a big schema: CGMES, generated from multiple UML diagrams (see e.g. http://sogno.energy/cimpy/cimpy.cgmes_v2_4_15.html and https://ontology.tno.nl/IEC_CIM/, who also generated most of it from a similar source). The yaml file has 4.700 lines, gen-doc produces 600+ files. Most attributes are added via slots and the class hierarchy uses lot's of inheritance where slots are defined on the parent.

Running gen-doc --metadata -d eq/docs eq/all.yaml takes >10min.

I spied on it with py-spy and this is the observed trace:

image

I interpret that that the two deepcopies are at fault here in SchemaView.induced_slot():

induced_slot = deepcopy(slot)
...
return deepcopy(induced_slot)
...

I'm unsure why this needs two deepcopies, at least the last on the return line seems unneeded (at least to my eys: every assignment is of a scalar or deepcopied itself).

The __bool__ line seems to be

        if class_name:
            cls = self.get_class(class_name, imports, strict=True)
        else:
            cls = None
...
        if cls: # <- THIS, maybe if cls is not None
...
            if not cls: # <- THIS, maybe if cls is None

I updated to linkml 1.3.16 and did the above changes (remove second deepcopy, change to is None checks instead of conversion to bool) and it reduced one runtime from ~17min to ~11min. Unfortunately there were differences ("just" changed order of attributes, as far as I can see), but I have similar changes with the unchanged verison as well (seems to be either unstable sorting in slots or a difference to an old version which we used to check in these generated doc files).

Stack details! Looking at the stacks (via `sudo py-spy record -p 52631 --format raw -d 20`), this is the slightly edited output (before the above changes, but afterwards it's basically similar, only with one place of the deepcopy): All the lines should be prefixed by ` (gen-doc:8);__call__ (click/core.py:1130);main (click/core.py:1055);invoke (click/core.py:1404);invoke (click/core.py:759);cli (linkml/generators/docgen.py:814);serialize (linkml/generators/docgen.py:189);render (jinja2/environment.py:1299);root (slot.md.jinja2:53);call (jinja2/runtime.py:298);get_classes_by_slot (linkml_runtime/utils/schemaview.py:1400);class_induced_slots (linkml_runtime/utils/schemaview.py:1216); (linkml_runtime/utils/schemaview.py:1216);` and one line was just that prefix. ``` induced_slot (linkml_runtime/utils/schemaview.py:1148);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:272);__getattr__ (jsonasobj2/_jsonobj.py:156) 2 induced_slot (linkml_runtime/utils/schemaview.py:1170);__hash__ (linkml_runtime/utils/schemaview.py:134) 5 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:138) 1 induced_slot (linkml_runtime/utils/schemaview.py:1148);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:146);_deepcopy_list (copy.py:203) 1 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:230) 1 induced_slot (linkml_runtime/utils/schemaview.py:1148);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:176) 1 induced_slot (linkml_runtime/utils/schemaview.py:1167);__bool__ (jsonasobj2/_jsonobj.py:166) 4 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:146);_deepcopy_dict (copy.py:229) 1 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172) 1 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:134) 1 induced_slot (linkml_runtime/utils/schemaview.py:1199) 5 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:137) 1 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:146);_deepcopy_list (copy.py:206);deepcopy (copy.py:161) 1 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172) 1 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:272) 1 induced_slot (linkml_runtime/utils/schemaview.py:1148);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:142) 1 induced_slot (linkml_runtime/utils/schemaview.py:1148);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172) 2 induced_slot (linkml_runtime/utils/schemaview.py:1148);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146) 2 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:230) 3 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:142) 1 induced_slot (linkml_runtime/utils/schemaview.py:1148);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:138) 2 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:146);_deepcopy_list (copy.py:206);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:230) 1 induced_slot (linkml_runtime/utils/schemaview.py:1166) 4 induced_slot (linkml_runtime/utils/schemaview.py:1172) 11 induced_slot (linkml_runtime/utils/schemaview.py:1148);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:264) 1 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231) 2 induced_slot (linkml_runtime/utils/schemaview.py:1170);__hash__ (linkml_runtime/utils/schemaview.py:134) 8 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:146);_deepcopy_atomic (copy.py:183) 2 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:138) 1 induced_slot (linkml_runtime/utils/schemaview.py:1190);__setattr__ (jsonasobj2/_jsonobj.py:163);__init__ (jsonasobj2/_jsonobj.py:65) 1 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:159) 1 induced_slot (linkml_runtime/utils/schemaview.py:1200);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:146) 4 induced_slot (linkml_runtime/utils/schemaview.py:1148);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:138) 1 induced_slot (linkml_runtime/utils/schemaview.py:1190) 3 induced_slot (linkml_runtime/utils/schemaview.py:1148);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:138) 1 induced_slot (linkml_runtime/utils/schemaview.py:1148);deepcopy (copy.py:172);_reconstruct (copy.py:271);deepcopy (copy.py:146);_deepcopy_dict (copy.py:231);deepcopy (copy.py:134) 2 induced_slot (linkml_runtime/utils/schemaview.py:1165) 2 ```

To reproduce probably: generate a schema file with lot's of slots and 4-5 level inheritance

Unfortunately, I cannot share the yaml file here :-(

Expected behavior

gen-doc should be fast.

sierra-moxon commented 1 year ago

Hi @jankatins - thank you so much for the detailed analysis, it really helps! I wanted to clarify whether or not the order of slots issue you note here after the fix you tried, is also an error for your use case? (from your analysis, it seems like we have a secondary unstable sorting issue that I would like to document as well).

jankatins commented 1 year ago

I've not looked further into the order stability issues :-( We currently use this as a kind of documentation (committed to git, viewed via the repository viewer) and unstable order would be not-nice but not a deal-breaker. Up to now it was stable, so it seems something changed, e.g. during linkml versions. But no clue what yet :-(

sierra-moxon commented 1 year ago

4/21/23 - I believe the issue persists on larger schemas.

cmungall commented 1 year ago

See also #1604

sierra-moxon commented 7 months ago

This likely is getting better with @sneakers-the-rat 's removal of unnecessary deepcopys from schemaview (used by gen-doc). I'm sure its not 100% better, but on its way! thanks @sneakers-the-rat.