tarashakhurana / 4d-occ-forecasting

CVPR 2023: Official code for `Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting'
https://www.cs.cmu.edu/~tkhurana/ff4d/index.html
MIT License
215 stars 24 forks source link

Question about occupancy probablity #10

Open zzzxxxttt opened 1 year ago

zzzxxxttt commented 1 year ago

Hi @tarashakhurana,

In model.py the occupancy probablity is calculated in line pog = 1 - torch.exp(-sigma), what's the reason behind this function 1-exp(-sigma)? And I found in dvr.cu, the way to get occupancy probablity is p[count] = 1 - exp(-sd), where sd = _sigma * _delta, why there is a * _delta involved?

tarashakhurana commented 1 year ago

Thanks for writing a detailed explanation! If you can convert it to latex I will be very happy to include the derivation in the supplement. I had a version but I lost it's latex copy.

zzzxxxttt commented 1 year ago

Thank you for your reply! I withdraw my previous comment since I found it not complete, there are still two questions remained:

The first question is, what does this "option 2" mean? image

And the second question, I create a simple test case in which the predicted sigma is (for brevity I omit the batch and time dimensions here):

[[[0, 0, 0, 0, 100],
  [0, 0, 0, 0, 0],
  [0, 0, 0, 0, 0],
  [0, 0, 0, 0, 0],
  [0, 0, 0, 0, 0]]]

And the the origin is at [0, 0, 0], the end point is at [4, 0, 0]. Now I pass the sigma and points to dvr rendering, the returned gradient is:

[[[-4, -3, -2, -1, 0],
  [ 0, 0, 0, 0, 0],
  [ 0, 0, 0, 0, 0],
  [ 0, 0, 0, 0, 0],
  [ 0, 0, 0, 0, 0]]]

This is confusing since the predicted occupancy is perfectly aligned with the gt point, but the gradient is still very large, especially at near the origin?

peiyunh commented 1 year ago

Hi @zzzxxxttt , great question and thanks for the example. It may seem unintuitive, but the code is working as intended. I will try to unpack it below. Let me know if there is any part that makes no sense.

First, the returned gradient is derivative of d (predicted depth) w.r.t. sigma (predicted density). To simplify the example, let's assume it is a 1-D grid and we have 5 voxels to consider. We predict 5 intensities (s for sigma): s[0], s[1], s[2], s[3], and s[4].

And say the probability of a ray terminating at voxel 0 can be written as p[0] = 1 - exp(-s[0]). Similarly, the probability of the same ray terminating at voxel 1 and can be written as: p[1] = exp(-s[0]) * (1 - exp(-s[1])), which is equal to the probability of it not terminating at voxel 0 times the conditional probability that it terminates at voxel 1.

Following this logic, we can write out the probability that the ray terminates at voxel 4 as: p[4] = exp(-s[0]) exp(-s[1]) exp(-s[2]) exp(-s[3]) (1 - exp(-s[4])).

It is also possible that the ray terminates outside the voxel grid, which we write as p[out] = exp(-s[0]) exp(-s[1]) exp(-s[2]) exp(-s[3]) exp(-s[4]).

Note that p[0] + p[1] + p[2] + p[3] + p[4] + p[out] = 1.

Now we can write the predicted depth as: d = p[0] 0 + p[1] 1 + p[2] 2 + p[3] 3 + p[4] 4 + p[out] 4.

Now to your first question, option 2 refers to the fact that we assign the same depth we assign to p[4] (i.e., 4) to p[out] - the event where the ray terminates outside the voxel grid.

To your second question, if we expand the formula for the predicted depth, we have: d = 4 - p[0] 4 - p[1] 3 - p[2] 2 - p[3] 1. Notice there is no p[4] (due to option 2 in this case), which explains why dd_dsigma[4] is equal to 0.

Let's compute dd_dsigma[3], we can follow chain rule and do: d(d)/d(s[3]) = d(d)/d(p[3]) d(p[3])/d(s[3]). We know that d(d)/d(p[3]) = -1 and d(p[3])/d(s[3]) = exp(-s[0]) exp(-s[1]) exp(-s[2]) exp(-s[3]) = 1. Therefore, d(d)/d(p[3]) = -1.

Similarly, you can compute d(d)/d(s[2]) = d(d)/d(p[2]) d(p[2])/d(s[2]) + d(d)/d(p[3]) d(p[3])/d(s[2]) = (-2) 1 + (-1) 0 = -2. And you can do the same for d(d)/d(s[1]) and d(d)/d(s[0]) as well.

Here, sigma is a non-negative quantity and is the output of a RELU function, which is non-differentiable at x = 0. When the input to ReLU is equal to or less than 0, we define a zero sub-gradient, which means during backprop, all the weights before RELU will get zero gradients and therefore won't get updated.

peiyunh commented 1 year ago

In case you are interested, here is somewhat a more complete derivation: raytrace.pdf

zzzxxxttt commented 12 months ago

Very nice explaination, thanks @peiyunh ! As for the non-differentiable 0 in Relu, I tried set the sigma to [0.001, 0.001, 0.001, 0.001, 100], the returned gradient is [-3.9990, -2.9991, -1.9993, -0.9996, -0.0000], still very large near the origin, maybe the non-differentiable 0 is not the key point?