manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
163 stars 50 forks source link

Add new AVX/SSE calls for cos/sin functions #310

Closed vidirsic closed 2 months ago

vidirsic commented 7 months ago

I have been playing around with the corrfunc package to measure correlation function in cosmological survey data, but I am encountering some technical issues that I was hoping you could advise on. In particular, I am trying to include a different weighting function scheme for AVX and SSE, where I have to call sin or cos of an argument. I am compiling on GNU compiler, and I was trying to include the relevant functions as follows (e.g. for SSE), however this either undefined references to the functions, or unknown functions if I do not declare them at the beginning of the relevant .h file (e..g sse_calls.h). I must be missing something obvious, and I was hoping you might have a solution to this problem seeing you have implemented many other functions that are compatible for SSE and AVX.

Edited: I have also tried using intel compiler (icc (ICC) 15.0.3 20150407) but I encounter different problems when compiling. I suspect the issue is that SVML or MKL is not properly linked in that case.

Thank you for any help and advice!

Example code:

/* sse_calls.h */

/* SSE */
__m128d _ZGVbN2v_sin(__m128d x);               /*  _mm_sin_pd(x)                           */
__m128  _ZGVbN4v_sinf(__m128 x);               /* _mm_sin_ps(x)                            */

#ifndef DOUBLE_PREC
#ifdef  __INTEL_COMPILER
#define SSE_SINE(X)                       _mm_sin_ps(X)
#else
#define SSE_SINE(X)                       _ZGVbN4v_sinf(X)
#endif
#else
#ifdef  __INTEL_COMPILER
#define SSE_SINE(X)                       _mm_sin_pd(X)
#else
#define SSE_SINE(X)                      _ZGVbN2v_sin(X)
#endif
#endif
manodeep commented 7 months ago

Thanks for opening the issue. Your best bet would be to use the Intel C compiler + SVML or MKL. And then the _mm_sin_ps/d should work for SSE, you will need a similar bit with _mm256_sin_ps/d for AVX, _mm512_sin_ps/d for AVX512F.

While it is not used at the moment, the common.mk contains the code the compile with MKL. See if that works for you, or check out the Intel MKL link-line advisor for your setup. If neither works, please post your compile log and the error you get.

If you want to go the difficult route, then you can use something like this library for SSE, and perhaps this AVX one. I haven't used either one personally - so can't vouch for usability or accuracy.

Hope this helps!

manodeep commented 2 months ago

@vidirsic Did you manage to resolve the issue?

vidirsic commented 2 months ago

Dear Manodeep,

This has been a while and I can fish for all the details, but here is a quick summary that I can recall. The common.mk setup did not seem to work for me, and I could not get the Intel MKL correctly linked on my system at the time. This seemed a promising route all in all, and it most likely required only more time on my part to figure out the local server setup and configure it. However I was a bit pressed for time back then and have decided to simply use a different platform for the calculations we wanted to do. Now that the publication is out we might come back to explore more optimal codes, but I don't have a particular timescale on that.

Hope this helps, and sorry for any inconvenience. Feel free to close the ticket.

Best, Vid

On Thu, 2 May 2024 at 23:52, Manodeep Sinha @.***> wrote:

@vidirsic https://github.com/vidirsic Did you manage to resolve the issue?

— Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/310#issuecomment-2091874886, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR7SWJITGMFLXYK3YG2JHWTZAK7SNAVCNFSM6AAAAABAC5BX7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJRHA3TIOBYGY . You are receiving this because you were mentioned.Message ID: @.***>

manodeep commented 2 months ago

Ohh not a problem at all! I am glad that you figured out a way around the issue.

Please do report any such issue - it helps us to improve the code for everyone :)