Open jankatins opened 1 year ago
Hi @jankatins - thank you so much for the detailed analysis, it really helps! I wanted to clarify whether or not the order of slots issue you note here after the fix you tried, is also an error for your use case? (from your analysis, it seems like we have a secondary unstable sorting issue that I would like to document as well).
I've not looked further into the order stability issues :-( We currently use this as a kind of documentation (committed to git, viewed via the repository viewer) and unstable order would be not-nice but not a deal-breaker. Up to now it was stable, so it seems something changed, e.g. during linkml versions. But no clue what yet :-(
4/21/23 - I believe the issue persists on larger schemas.
See also #1604
This likely is getting better with @sneakers-the-rat 's removal of unnecessary deepcopys from schemaview (used by gen-doc). I'm sure its not 100% better, but on its way! thanks @sneakers-the-rat.
Describe the bug
I've a big schema: CGMES, generated from multiple UML diagrams (see e.g. http://sogno.energy/cimpy/cimpy.cgmes_v2_4_15.html and https://ontology.tno.nl/IEC_CIM/, who also generated most of it from a similar source). The yaml file has 4.700 lines, gen-doc produces 600+ files. Most attributes are added via slots and the class hierarchy uses lot's of inheritance where slots are defined on the parent.
Running
gen-doc --metadata -d eq/docs eq/all.yaml
takes >10min.I spied on it with
py-spy
and this is the observed trace:I interpret that that the two deepcopies are at fault here in
SchemaView.induced_slot()
:I'm unsure why this needs two deepcopies, at least the last on the return line seems unneeded (at least to my eys: every assignment is of a scalar or deepcopied itself).
The
__bool__
line seems to beI updated to linkml 1.3.16 and did the above changes (remove second deepcopy, change to is None checks instead of conversion to bool) and it reduced one runtime from ~17min to ~11min. Unfortunately there were differences ("just" changed order of attributes, as far as I can see), but I have similar changes with the unchanged verison as well (seems to be either unstable sorting in slots or a difference to an old version which we used to check in these generated doc files).
Stack details!
Looking at the stacks (via `sudo py-spy record -p 52631 --format raw -d 20`), this is the slightly edited output (before the above changes, but afterwards it's basically similar, only with one place of the deepcopy): All the lines should be prefixed by `To reproduce probably: generate a schema file with lot's of slots and 4-5 level inheritance
Unfortunately, I cannot share the yaml file here :-(
Expected behavior
gen-doc should be fast.