xiph / opus

Modern audio compression for the internet.
https://opus-codec.org/
Other
2.17k stars 583 forks source link

Runtime crash with SSE byte alignments in MinGW #105

Open Troid92 opened 5 years ago

Troid92 commented 5 years ago

Hi, I am trying to use opus with MinGW. I have compiled successfully using ./configure; make and all tests are passing, but when I decode an opus stream in my own program (which can be reproduced by virtually any program that decodes an opus stream), it crashes with error code 0xC0000005 (memory access violation) in the file celt/x86/pitch_sse.c, in the function comb_filter_const_sse(), specifically at the very first call to _mm_load1_ps.

I am unfamiliar with architecture optimizations like this, but some research shows that _mm_load1_ps expects a 32-bit float const* which it duplicates into 128 bits of storage. However, comb_filter_const_sse() is instead passing a 16-bit opus_val16* which to me indicates that an additional 16 bits are getting copied that the program might not own which would explain the access violation.

I've verified that test_opus_decode passes and does indeed use comb_filter_const_sse() without crashing, so perhaps there is something different about the test environment vs. a practical application that allows this issue to slip by. If that's the case, the test environment cannot be trusted completely in its current state... That or I'm interpreting this code incorrectly and there is some other issue with my opus build.

If I change the generated config.h to not define OPUS_X86_MAY_HAVE_SSE then opus works perfectly fine on my system, but preferably opus should work out of the box, with SSE capabilities.

jmvalin commented 5 years ago

The values that get loaded with _mm_load1_ps() are of type opus_val16. When Opus if compiled as floating-point, then that type is defined as float, so there should be no problem. When Opus is compiled as fixed-point, the type is defined as short, which would be a problem... except that the SSE code you're pointing at should never be used for fixed-point. So that seems to point to some sort of build issue. What configure options did you use?

Troid92 commented 5 years ago

Ah, I see. I was confused by the misleading type name that has a prominent 16. Here is the default config.h file generated with ./configure on my system. So something in here must be contradictory...? config.h.txt As mentioned removing OPUS_X86_MAY_HAVE_SSE did the trick, as far as I can tell, but it sounds like there might be a more correct recommendation.

jmvalin commented 5 years ago

Well, removing OPUS_X86_MAY_HAVE_SSE will just not use that function at all, so it doesn't tell us much. You'd need to use a debugger to figure out exactly why it crashes. For example, what's the address the instruction is trying to load?

Troid92 commented 5 years ago

Okay -- I was just checking that there was nothing contradictory in the config file that would lead to that function's accidental inclusion when it shouldn't be.

I believe I have pinned the issue down, thanks to this useful article about MinGW's confusion with function callback byte alignments: MinGW SSE crash due to stack mis-alignment

In the Opus test environment, the variables were always 16-byte-aligned in memory, whereas in my particular application they were consistently offset by +4 bytes from 16-alignment, even if I manually flagged them to 16-align. _mm_load1_ps has no 16-alignment requirement, but apparently that flexibility is supposed to be handled by the compiler which isn't anticipating this callback scenario. There are two solutions offered:

  1. Tell GCC to assume the stack is not 16-byte aligned and fix it up on entry to every function. To do this, pass the -mstackrealign flag to GCC.
  2. Tell GCC to fix up the stack on each of the possible entry points. To do this, mark every entry point function in your code with __attribute__((force_align_arg_pointer)).

He recommends option 2 for efficiency, and I can confirm that option 2 works on my end. Defining SSE_ALIGN as __attribute__((force_align_arg_pointer)) and declaring the function as void SSE_ALIGN comb_filter_const_sse(...) avoids memory access violations, even with the variables still showing +4 offsets, and my opus file decodes correctly. I do not know if any other functions in the database would need this flag. I have not tested option 1. It appears to be a MinGW-specific issue, so hopefully a few extra preprocessor lines to solve this quirk wouldn't make things too hairy...

I'm not very well-studied in SSE or other architecture features so I apologize if I am not totally accurate in my understanding or explanation, but hopefully this is enough for someone experienced with these files to fix the issue. Everything else seems to be fine and dandy in Opus's MinGW support, and my setup is running as intended with SSE support now.

jmvalin commented 5 years ago

Can you see if you're able to get it working only by changing the definition of the OPUS_EXPORT macro (see include/opus_defines.h)?