[perf] Use local schemas if available

sneakers-the-rat commented 4 months ago

Edit to add relation to prior issues:

(Partial) Fix: https://github.com/linkml/linkml/issues/866 (Full) Fix: https://github.com/linkml/linkml/issues/1012

And i also see this was handled in a few different ways in different prior PRs/issues:

This, to me, points to a greater need to simplify and unify the loading behavior, since it seems like we have a patchwork of fixes here that didn't quite reach the root of the problem because the loading behavior is quite complex.

One can validate that network requests are still being made by, well, monitoring network traffic, as well as adding a debug flag just before the hbread printing what it's about to read.

Finally took the time to see what network requests were still happening during normal usage, because i kept hanging both on test runs and also when just trying to use the tool.

Turns out that hbread doesn't use requests (which would be cached during testing) and just directly calls urllib. It also turns out that most of the time we are just requesting types.yaml over and over again, and so we can safely use the local version of the meta schema instead - our local version should always be the one we prefer, since it's tagged to the particular version of linkml_runtime that we're using, as opposed to the URI version which could be any version (ie. would be the most recent version even if we wanted to use an older version of the spec).

edit: this was removed to satisfy a test that needed the fileinfo:

The only thing that looks dicey in here is me also trying to just cache all identical results from reads for a given cache period, and so I am avoiding passing FileInfo to hbread so that we can use lru_cache which requires hashable args.

Perf of request.py(urlopen): Before: 288.5s (cumulative) 0.8291s per call This PR: 18.94s (cumulative) 0.789s per call (we make fewer calls is the point) Difference: -269s (-93%)

Edit: i have no idea why this test is failing - I tried to fix the source schema file and remove the newline at the end of the file, but otherwise i have no idea why it decided to stop printing the filename between 3 hours ago and now. i'll come back in the morning

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 84.21053% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 62.92%. Comparing base (27b9158) to head (33ca663). Report is 3 commits behind head on main.

Files	Patch %	Lines
linkml_runtime/loaders/loader_root.py	72.72%	3 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #307 +/- ## ========================================== + Coverage 62.88% 62.92% +0.03% ========================================== Files 62 62 Lines 8528 8545 +17 Branches 2436 2437 +1 ========================================== + Hits 5363 5377 +14 - Misses 2554 2557 +3 Partials 611 611 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

cmungall commented 3 months ago

Thanks! Yes, we should be phasing out the older hbread code over time

linkml / linkml-runtime

[perf] Use local schemas if available #307

Codecov Report