[perf] Use yamllib if available

sneakers-the-rat commented 4 months ago

Loading yaml is actually really slow in pure python! and we load a lot of yaml!

There's a free perf boost we can get by just using the compiled yamllib versions if they are available (they usually are).

The only failure I noticed was that there was a single \t character at the start of a multiline string in the biolink model that cause s parsing error, but once that was fixed it was exactly the same just way faster.

For YAMLLoader.load_as_dict in main linkml tests, (cumulative time in seconds, --with-slow)

	Total test time	`load_as_dict`	per call
Before PR	1527s	451.3s	.1136s
This PR	1334s	243.9s	.062s
Change.	-193s	-207s	-.0516
Change:	-12.6%	-46%

all ~ for free ~

Same as https://github.com/linkml/linkml-runtime/pull/307 , i have no idea why these tests are failing. i didn't touch anything near that test, and this is a totally orthogonal change to 307 having the same error, and one of the tests here miraculously passed, so i think this is a flaky test rather than anything I did.

codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 62.90%. Comparing base (27b9158) to head (82fcfa6). Report is 3 commits behind head on main.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #306 +/- ## ========================================== + Coverage 62.88% 62.90% +0.01% ========================================== Files 62 62 Lines 8528 8532 +4 Branches 2436 2436 ========================================== + Hits 5363 5367 +4 Misses 2554 2554 Partials 611 611 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

cmungall commented 4 months ago

thx!

Looks like one failing test

We also want to consider whether to incorporate ruamel.yaml into our strategy. We use it in some places as it has nice features like comment preservations (very useful for diff/patch workflows). Not sure how it compares on performance

https://yaml.readthedocs.io/en/latest/pyyaml/

We currently have incomplete behavior with decimals using pyyaml: https://github.com/yaml/pyyaml/issues/255

See test_type_range in the compliance tests

linkml / linkml-runtime

[perf] Use yamllib if available #306

Codecov Report