Open zzzxxxttt opened 1 year ago
Thanks for writing a detailed explanation! If you can convert it to latex I will be very happy to include the derivation in the supplement. I had a version but I lost it's latex copy.
Thank you for your reply! I withdraw my previous comment since I found it not complete, there are still two questions remained:
The first question is, what does this "option 2" mean?
And the second question, I create a simple test case in which the predicted sigma is (for brevity I omit the batch and time dimensions here):
[[[0, 0, 0, 0, 100],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]]
And the the origin is at [0, 0, 0]
, the end point is at [4, 0, 0]
.
Now I pass the sigma and points to dvr rendering, the returned gradient is:
[[[-4, -3, -2, -1, 0],
[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0]]]
This is confusing since the predicted occupancy is perfectly aligned with the gt point, but the gradient is still very large, especially at near the origin?
Hi @zzzxxxttt , great question and thanks for the example. It may seem unintuitive, but the code is working as intended. I will try to unpack it below. Let me know if there is any part that makes no sense.
First, the returned gradient is derivative of d (predicted depth) w.r.t. sigma (predicted density). To simplify the example, let's assume it is a 1-D grid and we have 5 voxels to consider. We predict 5 intensities (s for sigma): s[0], s[1], s[2], s[3], and s[4].
And say the probability of a ray terminating at voxel 0 can be written as p[0] = 1 - exp(-s[0]). Similarly, the probability of the same ray terminating at voxel 1 and can be written as: p[1] = exp(-s[0]) * (1 - exp(-s[1])), which is equal to the probability of it not terminating at voxel 0 times the conditional probability that it terminates at voxel 1.
Following this logic, we can write out the probability that the ray terminates at voxel 4 as: p[4] = exp(-s[0]) exp(-s[1]) exp(-s[2]) exp(-s[3]) (1 - exp(-s[4])).
It is also possible that the ray terminates outside the voxel grid, which we write as p[out] = exp(-s[0]) exp(-s[1]) exp(-s[2]) exp(-s[3]) exp(-s[4]).
Note that p[0] + p[1] + p[2] + p[3] + p[4] + p[out] = 1.
Now we can write the predicted depth as: d = p[0] 0 + p[1] 1 + p[2] 2 + p[3] 3 + p[4] 4 + p[out] 4.
Now to your first question, option 2 refers to the fact that we assign the same depth we assign to p[4] (i.e., 4) to p[out] - the event where the ray terminates outside the voxel grid.
To your second question, if we expand the formula for the predicted depth, we have: d = 4 - p[0] 4 - p[1] 3 - p[2] 2 - p[3] 1. Notice there is no p[4] (due to option 2 in this case), which explains why dd_dsigma[4] is equal to 0.
Let's compute dd_dsigma[3], we can follow chain rule and do: d(d)/d(s[3]) = d(d)/d(p[3]) d(p[3])/d(s[3]). We know that d(d)/d(p[3]) = -1 and d(p[3])/d(s[3]) = exp(-s[0]) exp(-s[1]) exp(-s[2]) exp(-s[3]) = 1. Therefore, d(d)/d(p[3]) = -1.
Similarly, you can compute d(d)/d(s[2]) = d(d)/d(p[2]) d(p[2])/d(s[2]) + d(d)/d(p[3]) d(p[3])/d(s[2]) = (-2) 1 + (-1) 0 = -2. And you can do the same for d(d)/d(s[1]) and d(d)/d(s[0]) as well.
Here, sigma is a non-negative quantity and is the output of a RELU function, which is non-differentiable at x = 0. When the input to ReLU is equal to or less than 0, we define a zero sub-gradient, which means during backprop, all the weights before RELU will get zero gradients and therefore won't get updated.
In case you are interested, here is somewhat a more complete derivation: raytrace.pdf
Very nice explaination, thanks @peiyunh ! As for the non-differentiable 0 in Relu, I tried set the sigma to [0.001, 0.001, 0.001, 0.001, 100], the returned gradient is [-3.9990, -2.9991, -1.9993, -0.9996, -0.0000], still very large near the origin, maybe the non-differentiable 0 is not the key point?
Hi @tarashakhurana,
In model.py the occupancy probablity is calculated in line
pog = 1 - torch.exp(-sigma)
, what's the reason behind this function1-exp(-sigma)
? And I found in dvr.cu, the way to get occupancy probablity isp[count] = 1 - exp(-sd)
, wheresd = _sigma * _delta
, why there is a* _delta
involved?