One of the most time consuming tasks in pl_mpeg is actually reading the buffers, especially because every single read checked if the buffer still had enough data.
This change creates _unchecked versions of plm_buffer_read and plm_buffer_skip, which, as the name implies, doesn't check for the amount of available data still left.
To compensate, plm_buffer_has has been added to many places where the needed amount of available data can be figured out beforehand, so all _unchecked reads should be guaranteed to be safe.
I also added plm_buffer_is_aligned, which checks for bit alignment to a byte, plm_buffer_read_byte, which checks for enough buffer data available and bit alignment and plm_buffer_read_byte_unchecked, which actually directly reads the byte from the buffer without checking for the remaining buffer length or bit alignment.
A very small optimization to plm_video_idct was also added, preventing an avoidable sign flip to the y7 calculation by swapping out all remaining signs.
Some warnings specific to Visual Studio were also removed.
Overall, this yields a 5% to 7% performance improvement in my test cases.
As a note, I tried fiddling with SIMD, especially on plm_video_idct. I did get it to work but the performance was either worse (using SSE4.1) or only marginally (<1%) better (with AVX2), so I scrapped that idea.
One of the most time consuming tasks in pl_mpeg is actually reading the buffers, especially because every single read checked if the buffer still had enough data.
This change creates
_unchecked
versions ofplm_buffer_read
andplm_buffer_skip
, which, as the name implies, doesn't check for the amount of available data still left.To compensate,
plm_buffer_has
has been added to many places where the needed amount of available data can be figured out beforehand, so all_unchecked
reads should be guaranteed to be safe.I also added
plm_buffer_is_aligned
, which checks for bit alignment to a byte,plm_buffer_read_byte
, which checks for enough buffer data available and bit alignment andplm_buffer_read_byte_unchecked
, which actually directly reads the byte from the buffer without checking for the remaining buffer length or bit alignment.A very small optimization to
plm_video_idct
was also added, preventing an avoidable sign flip to they7
calculation by swapping out all remaining signs.Some warnings specific to Visual Studio were also removed.
Overall, this yields a 5% to 7% performance improvement in my test cases.
As a note, I tried fiddling with SIMD, especially on
plm_video_idct
. I did get it to work but the performance was either worse (using SSE4.1) or only marginally (<1%) better (with AVX2), so I scrapped that idea.