jamoma / JamomaCore

Jamoma Frameworks for Audio and Control Structure
Other
36 stars 14 forks source link

replacing TTAntidenormal code with flush-to-zero Intrinsic functions #56

Open hems opened 11 years ago

hems commented 11 years ago
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);

it needs xmmintrin.h and pmmintrin.h for it

they flush a very small number to zero even before a denormal number appears

the following does the same thing:

unsigned int mxcsr;
__asm__ __volatile__ ("stmxcsr (%0)" : : "r"(&mxcsr) : "memory");
mxcsr = (mxcsr | (1<<15) | (1<<6)); 
__asm__ __volatile__ ("ldmxcsr (%0)" : : "r"(&mxcsr));

it should be faster than our TTAntidenormal() code, because we do a isnormal() which is 0 when a denormal already is in the register

they are gcc macros and since it is not clear what to do for Windows:

Tim Place 11-05-25 4:04 PM what I think we can do is make a TTAntiDenormal() a macro that does nothing when compiled with GCC for Intel processors. Then it will still be there if needed in other contexts.

see more on the original redmine issue http://redmine.jamoma.org/issues/799

tap commented 9 years ago

From Redmine, I wrote:

I think that these macros will work on Windows, but it is unclear when they should be called. I'll explain:

They set a bit in the processor that says to zero denormals. If your machine has multiple processors/cores, which I'm betting it does, how do you set it on all of the processors? What happens when the OS switches to another application, which then sets the bit the otherway, and then when it comes back to you the bit isn't set anymore? But then you don't want to set it all of the time because this will cost something time-wise. How expensive is this call?

I don't know the answer to these questions.

You could imagine setting the bit when a vector starts processing in TTAudioObject. But what if the vector size is really really small? Then this bit is getting set constantly and burning CPU. Particularly if you have a chain of TTAudioObjects, why send the bit for every single one?

We could make the bit settable using a message to the TTEnvironment object, which would allow you to manually set the bit.

Perhaps, in the AudioGraph case, we could say that the AudioGraph's context is responsible for setting it when it issues a preprocess call. That won't do anything for us if the objects are used outside of the AudioGraph, but maybe it can serve as a model for what to do with DSP or Matrix operators?

Then Nils wrote:

I just saw that in the Faust vst architecture file:

// On Intel set FZ (Flush to Zero) and DAZ (Denormals Are Zero)
// flags to avoid costly denormals
#ifdef __SSE__
    #include 
    #ifdef __SSE2__
        #define AVOIDDENORMALS _mm_setcsr(_mm_getcsr() | 0x8040)
    #else
        #define AVOIDDENORMALS _mm_setcsr(_mm_getcsr() | 0x8000)
    #endif
#else
    #define AVOIDDENORMALS 
#endif

the ICC compiler has a -ftz flag ==> flush to zero on GCC, the -ffast-math or -funsafe-math-optimizations flag includes flush to zero operation

also interesting: http://stackoverflow.com/questions/2487653/avoiding-denormal-values-in-c