Thanks for the great research and code sharing.
After reading the paper and using it in my research, I got a question.
There are two styles for the implementation of weighted loss.
Case 1) L = w_a L_a + w_b L_b + w_c L_c
Case 2) L = L_a + w_b L_b + w_c * L_c
In case 2, the weight of a loss L_a is set to 1. In my humble opinion, I guess that w_b and w_c will be learned with relative log_vars values accordingly.
In your paper or code, on the other hand, all weights, i.e., all log_vars are set to learnable as in Case 1.
Is there any intention to prefer Case 1? Could it be a problem if I use the style of Case 2?
Thanks for the great research and code sharing. After reading the paper and using it in my research, I got a question. There are two styles for the implementation of weighted loss. Case 1) L = w_a L_a + w_b L_b + w_c L_c Case 2) L = L_a + w_b L_b + w_c * L_c In case 2, the weight of a loss L_a is set to 1. In my humble opinion, I guess that w_b and w_c will be learned with relative log_vars values accordingly. In your paper or code, on the other hand, all weights, i.e., all log_vars are set to learnable as in Case 1. Is there any intention to prefer Case 1? Could it be a problem if I use the style of Case 2?