linkml / linkml-runtime

Runtime support for linkml generated models
https://linkml.io/linkml/
Creative Commons Zero v1.0 Universal
22 stars 21 forks source link

Optimize implementation of SchemaView `get_classes_by_slot()` method #281

Open sujaypatil96 opened 8 months ago

sujaypatil96 commented 8 months ago

The get_classes_by_slot() method in SchemaView takes an extremely long time to run on the MIxS schema and generate the Applicable Classes table on slot documentation pages because of which we are having to explore ways to optimize the runtime for the get_classes_by_slot() method.

codecov[bot] commented 8 months ago

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (b86c1fa) 62.11% compared to head (954c207) 62.10%. Report is 6 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #281 +/- ## ========================================== - Coverage 62.11% 62.10% -0.01% ========================================== Files 63 63 Lines 8459 8463 +4 Branches 2169 2170 +1 ========================================== + Hits 5254 5256 +2 Misses 2599 2599 - Partials 606 608 +2 ``` | [Files](https://app.codecov.io/gh/linkml/linkml-runtime/pull/281?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=linkml) | Coverage Δ | | |---|---|---| | [linkml\_runtime/utils/schema\_as\_dict.py](https://app.codecov.io/gh/linkml/linkml-runtime/pull/281?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=linkml#diff-bGlua21sX3J1bnRpbWUvdXRpbHMvc2NoZW1hX2FzX2RpY3QucHk=) | `91.30% <100.00%> (ø)` | | | [linkml\_runtime/utils/schemaview.py](https://app.codecov.io/gh/linkml/linkml-runtime/pull/281?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=linkml#diff-bGlua21sX3J1bnRpbWUvdXRpbHMvc2NoZW1hdmlldy5weQ==) | `87.81% <100.00%> (+0.02%)` | :arrow_up: | | [linkml\_runtime/utils/namespaces.py](https://app.codecov.io/gh/linkml/linkml-runtime/pull/281?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=linkml#diff-bGlua21sX3J1bnRpbWUvdXRpbHMvbmFtZXNwYWNlcy5weQ==) | `72.51% <43.75%> (-0.86%)` | :arrow_down: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

sierra-moxon commented 8 months ago

We did some profiling (coincidentally :D) of docgen and found that the deepcopy calls for large schemas in schemaview were the most time-consuming bit. https://github.com/linkml/linkml-runtime/blob/687fc53338cc791d0f932c3b9a22c0e8313fd99b/linkml_runtime/utils/schemaview.py#L1201 https://github.com/linkml/linkml-runtime/blob/687fc53338cc791d0f932c3b9a22c0e8313fd99b/linkml_runtime/utils/schemaview.py#L1209

sujaypatil96 commented 8 months ago

Oh, that's very good to know, thank you for digging into this @sierra-moxon 😁

cmungall commented 8 months ago

good sleuthing! we should definitely avoid use of deepcopy when calculating induced slots

but note that induced slots may not be necessary for docgen purposes.

sierra-moxon commented 8 months ago

changing https://github.com/linkml/linkml/blob/598376ce7f8c11bd3cf31f0ca7e3d5c34770021a/linkml/generators/docgen/slot.md.jinja2#L26 from True to False in my custom template (and other instances in this file) took docgen on biolink down from >2 hours to just under a minute.

sierra-moxon commented 8 months ago

see also linkml issues #1214 and #1604

cmungall commented 4 months ago

@sujaypatil96 is this replaced by #300?