Fully non-allocating versions

stochastics-uni-luebeck / LevyArea.jl

Iterated stochastic integrals in Julia.

https://stochastics-uni-luebeck.github.io/LevyArea.jl/stable

MIT License

9 stars 0 forks source link

Fully non-allocating versions #7

Open ChrisRackauckas opened 2 years ago

ChrisRackauckas commented 2 years ago

This came up with the discussions with @frankschae. Many of these core functions allocate. Having a version which allows for pre-building the cache and then reusing that cache can help with performance, particularly in multithreaded contexts.

fkastner commented 2 years ago

I did profile some approaches, but hadn't settled on a final design. It turned out, that the preallocating version was slower than the current version and at that point I decided to postpone this (see e.g. https://github.com/JuliaLang/julia/issues/39566). I'm still not entirely sure, whether it would be worth the effort.

ChrisRackauckas commented 2 years ago

OpenBLAS has weird results with Ryzen. I would benchmark with MKL to get a better view of the real performance and let people use libblastrampoline on Ryzen.

fkastner commented 2 years ago

MKL shows the same behaviour for that example.