tweag / nickel

Better configuration for less
https://nickel-lang.org/
MIT License
2.43k stars 93 forks source link

[Optimization] Implement `prepare_custom_contract` natively #2008

Closed yannham closed 3 months ago

yannham commented 4 months ago

Motivation and content

After the rework of the contract system that sets enum as the default return type of most custom and builtin contracts, some additional glue code was needed to post-process such enums (and raise blame accordingly). This has been implemented in the new internal function $prepare_custom_contract.

While this function wasn't very long or involved, measurements showed that contract application is a code path that is taken so much that this additional blob of pure Nickel slowed down real world example by potentially a huge factor (up to 5x).

While this is the symptom of other inefficiencies related to the contract system in general (related to pattern matching, missed caching opportunities for contract generation, etc.), bringing back the logic of $prepare_custom_contract to native code brings back the overhead (master compared to latest stable 1.7) to something like 1.1x or 1.2x.

This commit implements this strategy, by introducing two new primops:

Additionally, this commit tries to avoid generating terms at run-time, but rather setup the stack directly, to further make this hot path faster.

Overall, it's a very ad-hoc optimization, which is a little sad and makes the Rust code a bit heavier. We should strive for general optimizations or architectural improvements that would be global (and not just for contract application), but in this case, the measures show that contract/apply is just run so much that a specialization is mandated. It doesn't prevent us from seeking more general optimizations.

Stack traces

Some snapshot tests show that the stack trace seems to be much shorter than previously. I'm not sure if it's the error reporting code that decides to cut them, or the stack elements that we generate at run-time make some of the callstack analysis heuristics fail. In any case, I don't have a simple solution, and honestly the callstack handling has been rather dark magic until then - maybe it's time to fix it in a more systematic rework of this part,