Open tranluan opened 6 years ago
Hi, @tranluan, I only modified ZbufferTriKernel<<<1, 1>>>(s2d, tri,...) to ZbufferTriKernel<<<32, 256>>>(s2d, tri,...) and the result of rendering_example.py seems correct, was this issue fixed?
It's only correct if you get an identical result with ZbufferTriKernel<<<1, 1>>>(s2d, tri,...) It's very likely that there will be a few pixels that are different.
It's only correct if you get an identical result with ZbufferTriKernel<<<1, 1>>>(s2d, tri,...) It's very likely that there will be a few pixels that are different.
Yes, it is not thread safe, but I think its ok for training, because neural network is robust to noises. And I modified your shading function, the training speed has increase from 6s/step to 0.5s/step
It's great to know. Please commit your shading function if possible. Thanks, Luan
In line 138 of TF_newop/cuda_op_kernel_v2_sz224.cu.cc we have to set block_count and thread_per_block to 1
ZbufferTriKernel<<<1, 1>>>(s2d, tri,...)
since it need access (read/write) into thezbuffer
. Implementing some sort of critical section for the block from line 102 - 106 could fix it.This can lead to a significant speed up of this operation.