tamara-schmitz / CPPsoftwareRenderer

3D SoftwareRenderer written in C++11 using SDL
MIT License
0 stars 0 forks source link

Multithreaded rendering / rasterisation #8

Open tamara-schmitz opened 7 years ago

tamara-schmitz commented 7 years ago

Essentially two sections can be multithreaded: vertex shading and rasterisation (including pixel shading). Listed below there a few multithreading concept proposals:

Vertex shader:

Rasterisation:

tamara-schmitz commented 7 years ago

The following is already implemented since at least: 32085e50c7e50971d03a0a6fa55066762b709328

### Queue fetches Use SafeQueue to reduce complications. Main thread should notify every frame about how many triangles have been sent out for rendering. Threads can then decide whether they should use pop() in blocking mode or notifiy the main thread that they have finished doing their work.

tamara-schmitz commented 7 years ago

Memory fencing

SafeQueues are in place but we use locks to prevent race conditions. Read about memory fencing instead: https://www.linuxjournal.com/content/lock-free-multi-producer-multi-consumer-queue-ring-buffer?page=0,1

Circular buffers

Switching from a Queue to a circular buffer seems like a good idea as it guarantees that there are no reallocations during pop and push. Memory allocations are also unnecessary during runtime. However buffer size is pretty static. Buffer stalls if write pointer just in front of read pointer (=> buffer is full). Check out Wikipedia for more information: https://en.wikipedia.org/wiki/Circular_buffer Also this may be useful: https://www.codeproject.com/Articles/153898/Yet-another-implementation-of-a-lock-free-circul

Other ideas

Use of a stack which also eliminates reallocations but constant allocs and deallocs may degrade performance.

tamara-schmitz commented 7 years ago

Current status

Threading works pretty much (suspect race condition in VP if VP count > 1 though). See 016ed891ff16d3f5e035124ae7b319a378c29229

Performance results are bad as expected as currently every triangle fetch from the rasteriser requires a lock.

tamara-schmitz commented 7 years ago

Other possible improvements

Profiling is required but VertexProcessorObjs may slow things down as they all have shared contains pointing at one texture. Concurrent reference counting could have a significant influence on performance.

tamara-schmitz commented 2 years ago

SafeQueue was rewritten to be the only queue type required. Only issues left are in copying rasteriser textures back to the main thread and rendering them.