zpzim / SCAMP

The fastest way to compute matrix profiles on CPU and GPU!
http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
MIT License
155 stars 35 forks source link

Optimize CPU kernels for MSVC and Apple Clang #73

Closed zpzim closed 2 years ago

zpzim commented 2 years ago

Performance on MSVC and Apple Clang is about 3-5x worse than on gcc or clang on Linux. We should attempt to bring them to parity. This is likely related to autovectorization differences in the compilers and there are some resources here which would allow us to debug this here:

  1. https://llvm.org/docs/Vectorizers.html
  2. https://docs.microsoft.com/en-us/cpp/parallel/auto-parallelization-and-auto-vectorization?view=msvc-170#auto-vectorizer
zpzim commented 2 years ago

78 Significantly improves this performance gap. Which should now be around 2x instead of 3-5x

zpzim commented 2 years ago

MSVC is still lagging behind the pack at approximately 4x worse than the rest, I don't expect it to be possible to bring MSVC to performance parity without using intrinsics in the kernel directly. It may be possible to use something like Eigen for this.

The main issue is that MSVC's autovectorizer is very simplistic and cannot handle conditionals and non-regular access patterns. The current optimizations are probably the best we can do without changing the implementation completely using intrinsics.

zpzim commented 2 years ago

I spent some time trying to use Eigen in the CPU kernels for better cross-platform performance. Preliminary testing shows that it works well and allows us to make the CPU kernel code a lot simpler.

Unfortunately MSVC still lags behind at ~3x slower. But there is improvement across all platforms. Will work on merging the branch when I have some more time.

zpzim commented 2 years ago

Eigen support has been added as of v2.1.3. There are still improvements to be made for all profile types on Windows and on all platforms for the SUM_THRESH and MATRIX_SUMMARY types.