spedygiorgio / markovchain

Easy Handling Discrete Time Markov Chains
https://spedygiorgio.github.io/markovchain/
Other
105 stars 40 forks source link

Reject null hypothesis when 0 in the object matrix is not 0 in the data matrix in verifyEmpiricalToTheoretical function #197

Closed ebbertd closed 4 years ago

ebbertd commented 4 years ago

Consider the following example:

library(markovchain)

example <- matrix(c(
  0.6105, 0.1665, 0.0393, 0.1837,
  0.1374, 0.5647, 0.0637, 0.2342,
  0.3010, 0.1142, 0.3218, 0.2630,
  0.2595, 0.3109, 0.0000, 0.4296
),
byrow = TRUE,
nrow = 4
)
rownames(example) <- c(1:4)
colnames(example) <- c(1:4)

mc <- matrix(c(
  0.00, 1.00, 0.00, 0.00,
  0.00, 0.00, 0.50, 0.50,
  0.00, 0.75, 0.00, 0.25,
  0.00, 0.75, 0.25, 0.00
),
byrow = TRUE,
nrow = 4
)
rownames(mc) <- c(1:4)
colnames(mc) <- c(1:4)
theoreticalMc <- as(mc, "markovchain")

verifyEmpiricalToTheoretical(data = example, object = theoreticalMc)

This results in the following:

Testing whether the
       1      2      3      4
1 0.6105 0.1665 0.0393 0.1837
2 0.1374 0.5647 0.0637 0.2342
3 0.3010 0.1142 0.3218 0.2630
4 0.2595 0.3109 0.0000 0.4296
transition matrix is compatible with
  1    2    3    4
1 0 1.00 0.00 0.00
2 0 0.00 0.50 0.50
3 0 0.75 0.00 0.25
4 0 0.75 0.25 0.00
[1] "theoretical transition matrix"
ChiSq statistic is -1.760846 d.o.f are 3 corresponding p-value is 1 
$statistic
        1 
-1.760846 

$dof
[1] 3

$pvalue
1 

The null hypothesis is that the two matrices are equal. Thus according to the p value of 1 the two matrices should be equal. However, the obviously are not. There are various instances in which object[i, j] == 0 for which cases data[i, j] > 0. So these two are far from equal.

I had a look at the paper this work is based and Kullback et al. (1962) state on page 596:

If the null hypothesis specifies that there are c instances for which P(Ei 1 Ei) = 0, then in table 7.2 we take fii In P(Ei | Ei) = 0 and fii In P(EiEi) = 0 (it is obvious that we reject the null hypothesis if fii > 0 in any such case); we also replace r² - 1 by r² - c - 1 and r(r - 1) by r(r -1) - c.

Thus back to the previous example. When in any case object[i, j] == 0 and the corresponding data[i, j] > 0 then the null hypothesis needs to be rejected.