phoboslab / pl_mpeg

Single file C library for decoding MPEG1 Video and MP2 Audio
799 stars 58 forks source link

Slight performance optimizations #33

Open crudelios opened 1 year ago

crudelios commented 1 year ago

One of the most time consuming tasks in pl_mpeg is actually reading the buffers, especially because every single read checked if the buffer still had enough data.

This change creates _unchecked versions of plm_buffer_read and plm_buffer_skip, which, as the name implies, doesn't check for the amount of available data still left.

To compensate, plm_buffer_has has been added to many places where the needed amount of available data can be figured out beforehand, so all _unchecked reads should be guaranteed to be safe.

I also added plm_buffer_is_aligned, which checks for bit alignment to a byte, plm_buffer_read_byte, which checks for enough buffer data available and bit alignment and plm_buffer_read_byte_unchecked, which actually directly reads the byte from the buffer without checking for the remaining buffer length or bit alignment.

A very small optimization to plm_video_idct was also added, preventing an avoidable sign flip to the y7 calculation by swapping out all remaining signs.

Some warnings specific to Visual Studio were also removed.

Overall, this yields a 5% to 7% performance improvement in my test cases.

As a note, I tried fiddling with SIMD, especially on plm_video_idct. I did get it to work but the performance was either worse (using SSE4.1) or only marginally (<1%) better (with AVX2), so I scrapped that idea.