Rendering layer speed up - multi-threaded GPU kernel

tranluan / Nonlinear_Face_3DMM

Source code for "Nonlinear 3D Face Morphable Model"

http://cvlab.cse.msu.edu/project-nonlinear-3dmm.html

Apache License 2.0

676 stars 124 forks source link

Rendering layer speed up - multi-threaded GPU kernel #1

Open tranluan opened 6 years ago

tranluan commented 6 years ago

In line 138 of TF_newop/cuda_op_kernel_v2_sz224.cu.cc we have to set block_count and thread_per_block to 1 ZbufferTriKernel<<<1, 1>>>(s2d, tri,...) since it need access (read/write) into the zbuffer. Implementing some sort of critical section for the block from line 102 - 106 could fix it.

This can lead to a significant speed up of this operation.

chaoshiedwin commented 4 years ago

Hi, @tranluan, I only modified ZbufferTriKernel<<<1, 1>>>(s2d, tri,...) to ZbufferTriKernel<<<32, 256>>>(s2d, tri,...) and the result of rendering_example.py seems correct, was this issue fixed?

tranluan commented 4 years ago

It's only correct if you get an identical result with ZbufferTriKernel<<<1, 1>>>(s2d, tri,...) It's very likely that there will be a few pixels that are different.

chaoshiedwin commented 4 years ago

It's only correct if you get an identical result with ZbufferTriKernel<<<1, 1>>>(s2d, tri,...) It's very likely that there will be a few pixels that are different.

Yes, it is not thread safe, but I think its ok for training, because neural network is robust to noises. And I modified your shading function, the training speed has increase from 6s/step to 0.5s/step

tranluan commented 4 years ago

It's great to know. Please commit your shading function if possible. Thanks, Luan