Estimation of negative probabilities with predict

albertocarm commented 3 years ago

I send a reproducible example showing negative probabilities. This occurs when one of the 'y' levels is rare, and the predictor has an extreme value in its range, so I have been slow to notice the problem.

set.seed(836)
data<- data.frame(HE6=sample(1:10, 200, replace = TRUE, prob=c(rep(0.1,6),0.01,0.002,0.19,0.198) ), 
                  Age = sample(1:85, 200, replace = TRUE), EORTC = sample(1:100, 200, replace = TRUE), 
                  linf=rbinom(200, 1,.5),
                  cir=rbinom(200, 1,.5),esquema=rbinom(200, 1,.5), riesgo=factor(rbinom(200, 2,.5)), estadio=factor(rbinom(200, 2,.5)))
head(data)
table(data$HE6)
dd<-datadist(data)
options(datadist='dd')

bsx <- blrm(HE6  ~ cir*rcs(Age,3)+ linf+ pol(EORTC)+esquema+estadio+riesgo, 
            ~ rcs(Age,3)+ pol(EORTC), cppo=function(y) y, data=data) 

newdata <- data.frame(cir=0, Age=85, EORTC= 10, linf=0, riesgo=0, esquema=1, estadio=1)
predict(bsx, newdata, type='fitted.ind') #

harrelfe commented 3 years ago

Thanks for the reproducible example. I tried

for(s in 4 : 10) {
  cat('s:', s, '\n')
f <- blrm(HE6  ~ cir*rcs(Age,3)+ linf+ pol(EORTC)+esquema+estadio+riesgo, 
            ~ rcs(Age,3)+ pol(EORTC), cppo=function(y) y, data=data, seed=s) 

  print(predict(f, newdata, type='fitted.ind'))
}

and could not get any negative point estimates although I saw a few lower uncertainty intervals that were slightly negative.

albertocarm commented 3 years ago

That is because it is a subtle issue, difficult to reproduce. It only seems to occur if some levels of the dependent variable are rare, and use extreme values for continuous predictors. To reproduce the error please use set.seed(836) to obtain exactly the same dataset. Notice the low frequency for y= 7 and y = 8. With my exact code you get this output, with an estimated negative probability for these same levels:

> set.seed(836)
> data<- data.frame(HE6=sample(1:10, 200, replace = TRUE, prob=c(rep(0.1,6),0.01,0.002,0.19,0.198) ), 
+                   Age = sample(1:85, 200, replace = TRUE), EORTC = sample(1:100, 200, replace = TRUE), 
+                   linf=rbinom(200, 1,.5),
+                   cir=rbinom(200, 1,.5),esquema=rbinom(200, 1,.5), riesgo=factor(rbinom(200, 2,.5)), estadio=factor(rbinom(200, 2,.5)))
> head(data)
  HE6 Age EORTC linf cir esquema riesgo estadio
1   6  85    42    0   0       1      1       0
2   5  49    49    1   0       0      1       2
3   6  31    58    1   0       0      1       2
4   6  73    81    1   1       1      0       0
5  10  81     6    1   0       0      1       1
6   9  28    28    0   0       0      0       1
> table(data$HE6)

 1  2  3  4  5  6  7  8  9 10 
18 16 22 13 27 22  3  1 46 32 
> dd<-datadist(data)
> options(datadist='dd')
> 
> bsx <- blrm(HE6  ~ cir*rcs(Age,3)+ linf+ pol(EORTC)+esquema+estadio+riesgo, 
+             ~ rcs(Age,3)+ pol(EORTC), cppo=function(y) y, data=data) 
Warning messages:
1: There were 920 divergent transitions after warmup. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them. 
2: Examine the pairs() plot to diagnose sampling problems

3: Some Pareto k diagnostic values are slightly high. See help('pareto-k-diagnostic') for details.

> #plot(summary (bsx, ycut=5))
> #plot(summary(bsx)) 
> 
> newdata <- data.frame(cir=0, Age=85, EORTC= 10, linf=0, riesgo=0, esquema=1, estadio=1) # aquí está la interfaz con .net
> 
> predict(bsx, newdata, type='fitted.ind') # 
        y x   Mean  Lower Upper
1   HE6=1 1  0.078  0.006 0.189
2   HE6=2 1  0.064  0.009 0.132
3   HE6=3 1  0.088  0.025 0.156
4   HE6=4 1  0.058  0.011 0.107
5   HE6=5 1  0.119  0.048 0.180
6   HE6=6 1  0.101  0.045 0.159
7   HE6=7 1 -0.006 -0.044 0.033
8   HE6=8 1 -0.010 -0.050 0.028
9   HE6=9 1  0.262  0.154 0.366
10 HE6=10 1  0.245  0.052 0.469

albertocarm commented 3 years ago

Therefore, when I try to plot Pr (y>= j) I get a zigzag instead of a downward curve

> predict(bsx, newdata, type='fitted') # 
      y x  Mean Lower Upper
1  y>=2 1 0.922 0.811 0.994
2  y>=3 1 0.858 0.690 0.986
3  y>=4 1 0.770 0.549 0.948
4  y>=5 1 0.712 0.476 0.918
5  y>=6 1 0.593 0.318 0.833
6  y>=7 1 0.492 0.227 0.752
7  y>=8 1 0.497 0.221 0.753
8  y>=9 1 0.507 0.241 0.785
9 y>=10 1 0.245 0.052 0.469

harrelfe commented 3 years ago

Since ordinality is relaxed with respect to two of the variables, it may be reasonable to expect this to occur (as long as values are just slightly negative) for a few posterior sampling paths. We may want to override negatives with zeros. It would be interested to learn if this ever happens for large sample sizes.

albertocarm commented 3 years ago

I have repeated it with 2000 simulated observations. The problem is minor, but it is still happening. Then, for the probability Pr (y >= j) you suggest adding up all the individual probabilities up to j, changing the negatives to zero, but he sum of point probabilities will no longer be 1. And wouldn't a positive prior be better?

harrelfe / rmsb

Estimation of negative probabilities with predict #4