Major performance improvement question

etodanik commented 6 months ago

I'm doing some IPC communication for graphics buffers and plibsys was a MAJOR slowdown. I narrowed down the problem to the following: https://github.com/saprykin/plibsys/blob/master/src/pshmbuffer.c#L196-L199

Now, on Windows + Visual Studio compiler, simply getting rid of the for loop and doing: memcpy((pchar*)storage, (pchar*)addr + P_SHM_BUFFER_DATA_OFFSET + ((read_pos ) % buf->size), to_copy); instead yields a 50x-100x performance boost (I'm going down from magnitudes of 60-100ms on a large buffer to just 1-2ms).

The performance difference is dramatic. I will check if the same translates for other compilers I'm working on (gcc/clang on MacOS and Linux).

The question is: Is there any specific reason that for loop is there? Are there any environments where you have confirmed with benchmarking that it's better to do the for loop? To me it seems like needlessly opting out of compiler optimizations.

saprykin commented 6 months ago

Hey Danny,

The only reason for this naive implementation back then was to handle edge case for the ring buffer. You can't always to a bulk read/write in a single transaction because sometimes you have to cross the boundary of the buffer.

The fix would be relatively trivial: check the case when you are going to cross the boundary of the buffer, and split a read/write transaction into two parts. Otherwise just a single read/write operation would be fine. You can submit a patch if you want, otherwise I can go through it over the weekend.

etodanik commented 6 months ago

Owner

Something like https://github.com/saprykin/plibsys/pull/105?

saprykin commented 6 months ago

@etodanik Can you please confirm that the performance issue solved in the merged version?

etodanik commented 6 months ago

Yes I can confirm that the performance issue is resolved

saprykin / plibsys

Major performance improvement question #104