cuda_rasterizer-->backward.cu中的361行-->368行中的final_A, final_D,final_D2需要随着循环变化吗？

zhuyu2015 commented 1 month ago

#else
            dL_dweight += (final_D2 + m_d * m_d * final_A - 2 * m_d * final_D) * dL_dreg;
#endif
            dL_dalpha += dL_dweight - last_dL_dT;
            // propagate the current weight W_{i} to next weight W_{i-1}
            last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;
            float dL_dmd = 2.0f * (T * alpha) * (m_d * final_A - final_D) * dL_dreg;
            dL_dz += dL_dmd * dmd_dd;

尤其367行的final_A和final_D要随循环的变化而变化吗？

zhuyu2015 commented 1 month ago

第367行是这一行：

float dL_dmd = 2.0f * (T * alpha) * (m_d * final_A - final_D) * dL_dreg;

hbb1 commented 1 month ago

Hi, here is the formulation for back-propagation of the depth-distortion loss:

$$L= \sum{i=0}^{N-1}\sum{j=0}^{N-1}w_iw_j(d_i-d_j)^2$$

Now we specifically analyze the $k$-th item and do some simplification,

$$ Lk = \sum{j=0}^{k-1}w_kw_j(d_k-dj)^2 + \sum{i=k+1}^{N-1}w_iw_k(d_i-d_k)^2 $$

$$ Lk = \sum{j=0}^{k-1}w_kw_j(d_k^2-2d_kd_j + dj^2) + \sum{i=k+1}^{N-1}w_iw_k(d_i^2-2d_id_k+d_k^2)$$

factorize the nested composition terms and we have got: $$dL_k / dwk = \sum{j=0}^{k-1}w_j(d_k^2-2d_kd_j+dj^2) + \sum{i=k+1}^{N-1}w_i(d_i^2-2d_id_k+d_k^2)$$

$$dL_k / dw_k = D^2-w_kd_k^2 + d_k^2(A_k-w_k) - 2dk(\sum{j=0}^{k-1}w_jdj + \sum{i=k+1}^{N-1}w_id_i +w_kd_k -w_kd_k) = D^2 +d_k^2A-2d_kD $$

$$dL_k / ddk = 2(\sum{j=0}^{k-1}w_kw_j(d_k-dj) + \sum{i=k+1}^{N-1}w_iw_k(d_k-d_i)) = 2(w_k(d_k(A-w_k) - (D-d_kw_k))) = 2(w_k(d_kA - D))$$

YanhaoZhang commented 1 month ago

Hi @hbb1 The above formulation is very helpful in understanding the implementation of the backward function. May I further ask about the formulation of the normal loss?

YanhaoZhang commented 1 month ago

Another quick question. When calculating the derivative, why not considering the case of d L_k / dw_m, (when m not equals to k)?

hbb1 commented 1 month ago

Because we will instead compute something like dL / dw_{k-1} so that the algorithm can be efficiently run back-to-front. Note that dL_k / d_wk * d_wk / d_w{k-1} = dLk / dw{k-1}

last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;

YanhaoZhang commented 3 weeks ago

@hbb1 Thanks a lot for your prompt reply. I tried to understand how the backward is implemented these days but still a bit confused. I would be very appreciative if you could explain a bit more.

Based on (17) $$\mathcal{L} = \sum^{N-1}_{i=0} \sum^{i-1}_j w_i w_j (d_i - d_j)^2$$.

$$w_k = \alphak \prod{i}^{k-1}(1-\alpha_i)$$ The derivative of $w_k$ is $$\frac{\partial \mathcal{L}}{\partial w_k} = \frac{\partial \mathcal{L}_k}{\partial w_k} = D^2 + d_k^2 A - 2d_kD $$ As the answer above, $\mathcal{L}_k$ is the k-th term of $$\sum^{N-1}_i \sum^{N-1}_j w_i w_j (d_i - d_j)^2 $$, where the second sum is to $N-1$ rather than to $i-1$ as (17).

The following code calculates $\frac{\partial \mathcal{L}}{\partial \alpha_k} $

dL_dalpha += dL_dweight - last_dL_dT;
// propagate the current weight W_{i} to next weight W_{i-1}
last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;

My understanding is that $$\frac{\partial \mathcal{L}}{\partial \alpha_k} = \sum_i^{N-1} \frac{\partial \mathcal{L}}{\partial w_i} \frac{\partial w_i}{\partial \alpha_k}$$ I know that $\frac{\partial w_i}{\partial \alpha_k}=0$ when i<k. Considering a back-to-front process, we iteratively calculate the following derivatives inside the loop for (int j = 0; !done && j < min(BLOCK_SIZE, toDo); j++) $$\frac{\partial \mathcal{L}}{\partial \alpha{N-1}}=\frac{\partial \mathcal{L}}{\partial w{N-1}} \frac{\partial w{N-1}}{\partial \alpha{N-1}}=\frac{\partial \mathcal{L}}{\partial w{N-1}} \prod{i}^{N-2}(1-\alphai)$$ $$\frac{\partial \mathcal{L}}{\partial \alpha{N-2}}=\frac{\partial \mathcal{L}}{\partial w{N-1}} \frac{\partial w{N-1}}{\partial w{N-2}}\frac{\partial w{N-2}}{\partial \alpha{N-2}}+\frac{\partial \mathcal{L}}{\partial w{N-2}} \frac{\partial w{N-2}}{\partial \alpha{N-2}}=( \frac{\alpha{N-1}}{\alpha{N-2}}(1-\alpha{N-2})\frac{\partial \mathcal{L}}{\partial w{N-1}} +\frac{\partial \mathcal{L}}{\partial w{N-2}})\prod{i}^{N-3}(1-\alpha_i)$$

I was wondering if I made any mistakes. If not, how to get the implementation above based on the formulations. Thanks a lot.

hbb1 commented 3 weeks ago

Hi, I found I did not write it clearly. $L$ can be seen as a summation of a $N \times N$ matrix. When we need derive the gradient of $w_k$, only some entries are related. I denoted the summation of them as $L_k$. So that $\partial L / \partial w_k = \partial L_k / \partial w_k$.

Now let's think of the gradients of $a_k$, they can pass from both $L$ through $wk$ and from $w{j}$ where $j>k$ (occlusion). The $dL_d / da_k$ is compute as above and the gradients from lateral $w_k$ is computed as

dL_dalpha += dL_dweight - last_dL_dT;
# pass the next
last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;

YanhaoZhang commented 3 weeks ago

@hbb1 Really appreciate your prompt reply. I can understand most of the parts until the last sentence on $dL/d a_k$. Because of occlusion $\frac{\partial w_i}{\partial a_k}=0$ where $i < k$, therefore $$\frac{\partial L}{\partial ak} = \sum{i=k}^{N-1} \frac{\partial L}{\partial w_i} \frac{\partial w_i}{\partial a_k}$$ Also $$w_k = ak \prod{i=0}^{k-1}(1-a_i)$$ which means $w_k = \frac{ak(1-a{k-1})}{a{k-1}}w{k-1}$. Therefore we have $\frac{\partial wk}{\partial w{k-1}}=\frac{ak(1-a{k-1})}{a_{k-1}}$ and $\frac{\partial w_k}{\partial a_k}=\prod(1-ai)=T{k-1}$. Considering a back-to-front process and starting on the final $n$-th Gaussian: $$\frac{\partial L}{\partial a_n}=\frac{\partial L}{\partial w_n} \frac{\partial w_n}{\partial a_n}=\frac{\partial L}{\partial wn}T{n-1}$$ $$\frac{\partial L}{\partial a{n-1}}= \frac{\partial L}{\partial w{n-1}} \frac{\partial w{n-1}}{\partial a{n-1}} + \frac{\partial L}{\partial w_n} \frac{\partial wn}{\partial w{n-1}} \frac{\partial w{n-1}}{\partial a{n-1}}=(\frac{\partial L}{\partial w_{n-1}} - \frac{an(a{n-1}-1)}{a{n-1}}\frac{\partial L}{\partial w{n}} )T_{n-1}$$.

Based on the code

dL_dalpha += dL_dweight - last_dL_dT;
# pass the next
last_dL_dT = dL_dweight * alpha + (1 - alpha) * last_dL_dT;

Denoting last_dL_dT as $\frac{d L}{d T_{k+1}}$, and ignoring $Tk$, within the loop: $$\frac{d L}{d T{n+1}}=0$$ $$\frac{\partial L}{\partial a_n}=\frac{\partial L}{\partial wn} - \frac{d L}{d T{n+1}} =\frac{\partial L}{\partial wn} $$ $$\frac{d L}{d T{n}}=\frac{\partial L}{\partial w_n}a_n+(1-an)\frac{d L}{d T{n+1}}=a_n\frac{\partial L}{\partial wn}$$ $$\frac{\partial L}{\partial a{n-1}}= \frac{\partial L}{\partial w{n-1}} - \frac{d L}{d T{n}} = \frac{\partial L}{\partial w_{n-1}} - a_n\frac{\partial L}{\partial w_n}$$. I find $\frac{an}{a{n-1}}\frac{\partial L}{\partial w_{n}}$ is missing.

May I know if I made any mistake? Thanks a lot.

ranrandy commented 2 weeks ago

Thanks guys for the detailed equations. It really helped me better understand the computations.

In short, the backward computation here actually is very similar to the $\frac{\partial L}{\partial \alpha}$ for the color channels, the only difference is that last_color is replaced by dL_dweight. We don't really need things like $\frac{\partial wn}{\partial w{n-1}}$, we just need the explicit form of every term in

$$\sum_{i=k}^{n} \frac{\partial w_i}{\partial \alpha_k}.$$

And this can be implemented recursively if we do it back-to-front.

@YanhaoZhang I think the problem is that in this equation:

$$\frac{\partial L}{\partial a{n-1}}= \frac{\partial L}{\partial w{n-1}} \frac{\partial w{n-1}}{\partial a{n-1}} + \frac{\partial L}{\partial w_n} \frac{\partial wn}{\partial w{n-1}} \frac{\partial w{n-1}}{\partial a{n-1}}=(\frac{\partial L}{\partial w_{n-1}} - \frac{an(a{n-1}-1)}{a{n-1}}\frac{\partial L}{\partial w{n}} )T_{n-1},$$

this part

$$\frac{\partial L}{\partial w_n} \frac{\partial wn}{\partial w{n-1}} \frac{\partial w{n-1}}{\partial a{n-1}} $$

is incomplete if we want the gradient of $\alpha_{n-1}$ from $w_n$, because

$$wn = w{n-1}\frac{\alphan}{\alpha{n-1}}(1-\alpha{n-1})=w{n-1}(\frac{\alphan}{\alpha{n-1}}-\alpha_n).$$

So, actually, the gradient of $\alpha_{n-1}$ from $w_n$ is

$$\frac{\partial L}{\partial w_n} \big( \frac{\partial wn}{\partial w{n-1}} \frac{\partial w{n-1}}{\partial a{n-1}} + \frac{\partial w_n}{\partial \frac{\alphan}{\alpha{n-1}}-\alpha_n} \frac{\partial \frac{\alphan}{\alpha{n-1}}-\alphan}{\partial a{n-1}}\big),$$

which equals to

$$\frac{\partial L}{\partial w_n} \big( \frac{\alphan}{\alpha{n-1}} (1-\alpha{n-1}) T{n-1} + w_{n-1} (-\frac{\alphan}{\alpha{n-1}^2})\big),$$

where the first part is what you got. But we still have the second part, after adding which, using $w{n-1}=\alpha{n-1}T_{n-1}$, we get

$$-\frac{\partial L}{\partial w_n} \alphanT{n-1}.$$

Then we have that

$$\frac{\partial L}{\partial a{n-1}} = (\frac{\partial L}{\partial w{n-1}} - \alphan\frac{\partial L}{\partial w{n}} )T_{n-1},$$

which is consistent with the code, and it is correct.

YanhaoZhang commented 2 weeks ago

@ranrandy Hi, thanks a lot for your explanation. Now it is clear to me how the derivative is obtained. Appreciate it!

hbb1 commented 2 weeks ago

Decent and rigorous mathematical derivation! From my side, I derive it from the perspective of sequence to sequence {a_k} -> {T_k} -> {w_k}, using my knowledge like BPTT (back-propagation through time) from RNN. @ranrandy makes this more clear!

Thanks everyone for involving and discussion.

hbb1 / 2d-gaussian-splatting

cuda_rasterizer-->backward.cu中的361行-->368行中的final_A, final_D,final_D2需要随着循环变化吗？ #28