tarashakhurana / 4d-occ-forecasting

CVPR 2023: Official code for `Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting'
https://www.cs.cmu.edu/~tkhurana/ff4d/index.html
MIT License
210 stars 22 forks source link

Question about occupancy probablity #10

Open zzzxxxttt opened 1 year ago

zzzxxxttt commented 1 year ago

Hi @tarashakhurana,

In model.py the occupancy probablity is calculated in line pog = 1 - torch.exp(-sigma), what's the reason behind this function 1-exp(-sigma)? And I found in dvr.cu, the way to get occupancy probablity is p[count] = 1 - exp(-sd), where sd = _sigma * _delta, why there is a * _delta involved?

tarashakhurana commented 11 months ago

Thanks for writing a detailed explanation! If you can convert it to latex I will be very happy to include the derivation in the supplement. I had a version but I lost it's latex copy.

zzzxxxttt commented 10 months ago

Thank you for your reply! I withdraw my previous comment since I found it not complete, there are still two questions remained:

The first question is, what does this "option 2" mean? image

And the second question, I create a simple test case in which the predicted sigma is (for brevity I omit the batch and time dimensions here):

[[[0, 0, 0, 0, 100],
  [0, 0, 0, 0, 0],
  [0, 0, 0, 0, 0],
  [0, 0, 0, 0, 0],
  [0, 0, 0, 0, 0]]]

And the the origin is at [0, 0, 0], the end point is at [4, 0, 0]. Now I pass the sigma and points to dvr rendering, the returned gradient is:

[[[-4, -3, -2, -1, 0],
  [ 0, 0, 0, 0, 0],
  [ 0, 0, 0, 0, 0],
  [ 0, 0, 0, 0, 0],
  [ 0, 0, 0, 0, 0]]]

This is confusing since the predicted occupancy is perfectly aligned with the gt point, but the gradient is still very large, especially at near the origin?

peiyunh commented 10 months ago

Hi @zzzxxxttt , great question and thanks for the example. It may seem unintuitive, but the code is working as intended. I will try to unpack it below. Let me know if there is any part that makes no sense.

First, the returned gradient is derivative of d (predicted depth) w.r.t. sigma (predicted density). To simplify the example, let's assume it is a 1-D grid and we have 5 voxels to consider. We predict 5 intensities (s for sigma): s[0], s[1], s[2], s[3], and s[4].

And say the probability of a ray terminating at voxel 0 can be written as p[0] = 1 - exp(-s[0]). Similarly, the probability of the same ray terminating at voxel 1 and can be written as: p[1] = exp(-s[0]) * (1 - exp(-s[1])), which is equal to the probability of it not terminating at voxel 0 times the conditional probability that it terminates at voxel 1.

Following this logic, we can write out the probability that the ray terminates at voxel 4 as: p[4] = exp(-s[0]) exp(-s[1]) exp(-s[2]) exp(-s[3]) (1 - exp(-s[4])).

It is also possible that the ray terminates outside the voxel grid, which we write as p[out] = exp(-s[0]) exp(-s[1]) exp(-s[2]) exp(-s[3]) exp(-s[4]).

Note that p[0] + p[1] + p[2] + p[3] + p[4] + p[out] = 1.

Now we can write the predicted depth as: d = p[0] 0 + p[1] 1 + p[2] 2 + p[3] 3 + p[4] 4 + p[out] 4.

Now to your first question, option 2 refers to the fact that we assign the same depth we assign to p[4] (i.e., 4) to p[out] - the event where the ray terminates outside the voxel grid.

To your second question, if we expand the formula for the predicted depth, we have: d = 4 - p[0] 4 - p[1] 3 - p[2] 2 - p[3] 1. Notice there is no p[4] (due to option 2 in this case), which explains why dd_dsigma[4] is equal to 0.

Let's compute dd_dsigma[3], we can follow chain rule and do: d(d)/d(s[3]) = d(d)/d(p[3]) d(p[3])/d(s[3]). We know that d(d)/d(p[3]) = -1 and d(p[3])/d(s[3]) = exp(-s[0]) exp(-s[1]) exp(-s[2]) exp(-s[3]) = 1. Therefore, d(d)/d(p[3]) = -1.

Similarly, you can compute d(d)/d(s[2]) = d(d)/d(p[2]) d(p[2])/d(s[2]) + d(d)/d(p[3]) d(p[3])/d(s[2]) = (-2) 1 + (-1) 0 = -2. And you can do the same for d(d)/d(s[1]) and d(d)/d(s[0]) as well.

Here, sigma is a non-negative quantity and is the output of a RELU function, which is non-differentiable at x = 0. When the input to ReLU is equal to or less than 0, we define a zero sub-gradient, which means during backprop, all the weights before RELU will get zero gradients and therefore won't get updated.

peiyunh commented 10 months ago

In case you are interested, here is somewhat a more complete derivation: raytrace.pdf

zzzxxxttt commented 10 months ago

Very nice explaination, thanks @peiyunh ! As for the non-differentiable 0 in Relu, I tried set the sigma to [0.001, 0.001, 0.001, 0.001, 100], the returned gradient is [-3.9990, -2.9991, -1.9993, -0.9996, -0.0000], still very large near the origin, maybe the non-differentiable 0 is not the key point?