Closed etodanik closed 6 months ago
Hey Danny,
The only reason for this naive implementation back then was to handle edge case for the ring buffer. You can't always to a bulk read/write in a single transaction because sometimes you have to cross the boundary of the buffer.
The fix would be relatively trivial: check the case when you are going to cross the boundary of the buffer, and split a read/write transaction into two parts. Otherwise just a single read/write operation would be fine. You can submit a patch if you want, otherwise I can go through it over the weekend.
Owner
Something like https://github.com/saprykin/plibsys/pull/105?
@etodanik Can you please confirm that the performance issue solved in the merged version?
Yes I can confirm that the performance issue is resolved
I'm doing some IPC communication for graphics buffers and
plibsys
was a MAJOR slowdown. I narrowed down the problem to the following: https://github.com/saprykin/plibsys/blob/master/src/pshmbuffer.c#L196-L199Now, on Windows + Visual Studio compiler, simply getting rid of the for loop and doing:
memcpy((pchar*)storage, (pchar*)addr + P_SHM_BUFFER_DATA_OFFSET + ((read_pos ) % buf->size), to_copy);
instead yields a 50x-100x performance boost (I'm going down from magnitudes of 60-100ms on a large buffer to just 1-2ms).The performance difference is dramatic. I will check if the same translates for other compilers I'm working on (gcc/clang on MacOS and Linux).
The question is: Is there any specific reason that for loop is there? Are there any environments where you have confirmed with benchmarking that it's better to do the for loop? To me it seems like needlessly opting out of compiler optimizations.