Open sneakers-the-rat opened 4 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 62.88%. Comparing base (
27b9158
) to head (ee4e4b2
). Report is 31 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
would that be helpful? would that look like just running the main linkml tests with the PR'd version of linkml-runtime, or did you have something else in mind? for now I can just run them and post results
would that look like just running the main linkml tests with the PR'd version of linkml-runtime
That would work! IMO it's a bit clunky and it's a big ask any time there is a schemaview change but induced slots are central enough it's worth doing here, thx!
So yes test failures, but i honestly can't tell if it's from this or not.. i'll investigate more thoroughly later, but would advise not merging yet. Tests also took 2 hours to run and i can't rly fathom why. will also profile that later.
====================================================== short test summary info =======================================================
FAILED tests/test_issues/test_issue_494.py::test_jsonschema_validation - AssertionError: Regex pattern did not match.
FAILED tests/test_linter/test_cli.py::TestLinterCli::test_config_extends_recommended - AssertionError: 1 != 2
FAILED tests/test_linter/test_cli.py::TestLinterCli::test_directory_of_files - AssertionError: 'schemas/schema_a.yaml' not found in "/private/var/folders/b8/28n_6_zn7flgh40028jxd8j80000gn/T/tmp0wrjounp/schema...
FAILED tests/test_linter/test_cli.py::TestLinterCli::test_no_config - AssertionError: 1 != 2
FAILED tests/test_linter/test_cli.py::TestLinterCli::test_no_schema_errors - AssertionError: 1 != 0
FAILED tests/test_linter/test_cli.py::TestLinterCli::test_processes_dot_files_in_directory_when_a_option_provided - AssertionError: 'schemas/schema_a.yaml' not found in "/private/var/folders/b8/28n_6_zn7flgh40028jxd8j80000gn/T/tmpf87woivi/schema...
FAILED tests/test_linter/test_cli.py::TestLinterCli::test_processes_dot_files_in_directory_when_all_option_provided - AssertionError: 'schemas/schema_a.yaml' not found in "/private/var/folders/b8/28n_6_zn7flgh40028jxd8j80000gn/T/tmps6ym618p/schema...
FAILED tests/test_linter/test_cli.py::TestLinterCli::test_validate_schema - AssertionError: 1 != 2
FAILED tests/test_linter/test_rule_canonical_prefixes.py::TestCanonicalPrefixesRule::test_custom_context - AttributeError: 'NoneType' object has no attribute 'append'
FAILED tests/test_linter/test_rule_canonical_prefixes.py::TestCanonicalPrefixesRule::test_default_merged_context - AttributeError: 'NoneType' object has no attribute 'append'
ERROR tests/test_workspaces/test_example_runner.py::test_load_from_dict - AttributeError: 'NoneType' object has no attribute 'append'
ERROR tests/test_workspaces/test_example_runner.py::test_example_runner - AttributeError: 'NoneType' object has no attribute 'append'
ERROR tests/test_workspaces/test_example_runner.py::test_example_runner_non_defaults - AttributeError: 'NoneType' object has no attribute 'append'
========== 10 failed, 6151 passed, 964 skipped, 3 xfailed, 4 warnings, 3 errors, 261 subtests passed in 6882.77s (1:54:42) ===========
OK figured out what was wrong - it was just that i had installed my local version of prefixmaps in my linkml environment to test https://github.com/linkml/prefixmaps/pull/69 and hadn't fixed a bug in it.
tests pass now, except for test_issue_494.py
which fails because of this, which is pending in another PR: https://github.com/linkml/linkml/pull/1704/commits/f148a1882df1fd162f24b6dc8b192491116dfa16
this is good 2 merge i think, but go ahead and test it urself!
triggering a manual build to test the new workflow in
Omg I hope it works, lmk if it doesnt and can debug
Ok now that the action is in main i can test it on my fork. Sorry for the runaround with these CI actions, its specifically a weird thing for forks. Ill make it work
Waiting for tests to run on the array branch and browsing through issues and PRs, noticed there was one for optimizing
get_classes_by_slot
https://github.com/linkml/linkml-runtime/pull/300 and any time there's aninduced_slot
perf thing i gotta check it out now.For non-induced slots, #300 is a perf wash (for 100 rounds through all slots in biolink, 6.77 before and 6.81 after), but when i went to profile it with induced slots i was surprised at how long the estimated time for a single round through was supposed to take - roughly 3 hours (~160,000x slower vs non-induced slots).
Using sets is a great idea for removing the step of making something unique at the end. we can take advantage of that further by
break
ing after the first set addition (since future set additions will do nothing).the problem is that we are still making basically the most expensive call you can do with schemaview, which is to call
induced_slot
on every combination of slot and class for a schema.induced_slot
is expensive in itself, but it's usually fine to do because you're only calling it in the context of a few classes and the caching can handle how slow it is, but since the call is every slot in the context of every class, no caching can be done.The fix is relatively simple - since
get_classes_by_slot
only cares about the existence of the slot on the class, not on its full induced form, there is no need to useinduced_slot
at all.class_induced_slots
iterates over the slots given byclass_slots
: https://github.com/linkml/linkml-runtime/blob/7c311d975422ddf990ae524fe6d969326c1b9261/linkml_runtime/utils/schemaview.py#L1368so any slots that
class_induced_slots
knows about,class_slots
knows about. The implementation in this PR also avoids iterating over the classes twice, although that's also a perf wash because iterating through a list and comparing strings takes basically no time.So for the simple timing function
where i break out of getting the induced slots after 5 slots because it takes so long,
you get:
So i'll call the regular difference in the noise and the induced slot call roughly 1100x faster. A full run over biolink takes 12s vs ~3h.
attached profiling results where
beforest
is before 300,before
is 300 andafter
is this PR.profiling-get_classes_by_slot.zip