Speed up Location.of_position_opt

WardBrian commented 1 year ago

I have been playing around with profiling using Landmarks and I found something kind of surprising: Location.of_position_opt is currently a pretty massive time waster in the compiler.

Of the ~78 million clock cycles spent in Parse.parse_file for the model I was benchmarking against, ~66 million of them are just calling Location.of_position_opt, because basically every token wants to turn a Lexing.position into a Location_span.t. And, we're using an unnecessary and slow regex search in every one of those calls.

This re-writes that function to no-longer need to do a regex search. This changes the profiler output (name; location; calls; cycles) from

Location.of_position_opt; src/middle/Location.ml:122; 941; 66.38M

to

Location.of_position_opt; src/middle/Location.ml:124; 941; 911.59K

For the motherHOF.stan test file this change nearly halves the wall-clock time for the compiler to run.

Submission Checklist

[x] Run unit tests
Documentation
- [ ] If a user-facing facing change was made, the documentation PR is here:
- [x] OR, no user-facing changes were made

Release notes

Speeds up some code which is critical to parsing.

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)

nhuurre commented 1 year ago

Location.of_string_opt also does a regex search. Maybe that's worth optimizing too? Otherwise, the problem recurs with a Stan file like

#include motherHOF.stan

Actually, it seems really expensive to repeatedly parse the include_from location from the filename string. If you could store the include_from position in include_stack or something, you wouldn't need to parse it at all. I'm not sure how the parser interacts with global state, though.

WardBrian commented 1 year ago

The regex used in of_string_opt is actually doing much more interesting work since it’s using |, but it would also be worth taking a look at. It’s generally a much colder code path, even when I profile a model which does use #include

andrjohns commented 1 year ago

Chiming in to say that I'm already seeing a benefit from this PR!

I'm looking at replacing rstan's V8 dependency with a lightweight engine we can bundle and this PR reduced the parse time for a heavy model from 15s -> 5s!

WardBrian commented 1 year ago

Nice! I’ve been meaning to do some more profiling on expensive models to see if there are obvious performance wins still out there.

Last time I checked following this PR it turned out that a decent chunk of time for a lot of models was just loading in the list of Stan math library signatures

stan-dev / stanc3