unreasonable：the loss is optimized to negative value

xueqings commented 1 month ago

According to the derivation of your supplementary materials, the heteroscedasticity loss should be positive, and the total loss should also be positive. However, in the process of code training, the loss is optimized to negative value, and I did not find an obvious lower bound. May I ask if it is a problem set by me in the process of code training?May I ask the author to explain why the loss is negative? Thank you very much!

ackbar03 commented 1 month ago

The loss function is stated in equation 1 of the paper and is the average of

( (yi - y)^2 ) / 2Var + ln(Var)/2

When Var < 1, ln(Var) < 0, and this can be larger than the first term, and this is easily the case if the labels have been standardized to 0 mean unit standard deviation

xueqings commented 1 month ago

The loss function is stated in equation 1 of the paper and is the average of

( (yi - y)^2 ) / 2Var + ln(Var)/2

When Var < 1, ln(Var) < 0, and this can be larger than the first term, and this is easily the case if the labels have been standardized to 0 mean unit standard deviation

Thank you very much for your answer. I still have some questions.：① In the author's supplementary proof, the objective function is derived by negative log-likelihood, looking at equations (4) and (5), it is obvious that the value of the objective function cannot be negative; ② In addition, if the loss function is negative, it should also have a lower bound. Otherwise, isn't the parameter optimal when the loss value is negative infinity? I didn't find a clear lower bound in the code either. f you are capable of answering my questions, I would be extremely grateful.

ackbar03 commented 1 month ago

Regarding 1)

looking at equations (4) and (5), it is obvious that the value of the objective function cannot be negative;

It seems your confusion is mainly because of this. This statement is false.

I suspect you are confusing the concept of probability density function with the concept of probability.

p(y|x) can easily be greater than 1, in which case the overall log-likelihood function is negative (see https://www.zhihu.com/question/26344963 or https://math.stackexchange.com/questions/1720053/how-can-a-probability-density-function-pdf-be-greater-than-1). If you are really concerned about this issue I suggest you revisit the concept of maximum likelihood estimation and probability theory. I don't think a github issue is the right place to explain it.

Regarding 2)

In addition, if the loss function is negative, it should also have a lower bound. Otherwise, isn't the parameter optimal when the loss value is negative infinity?

Why does a loss function need a lower bound? It is not an issue if the optimum parameters correspond with a loss that tends to infinity. This will occur in the case if the data can be perfectly modeled with a linear regression model and almost zero error.

xmed-lab / UCVME

unreasonable：the loss is optimized to negative value #1