We have observed cases where I(X;Y|Z)<0 because of the way we perform the discretizatinos. It happens when we have some continuous Zs and X or Y (or both) discrete.
We compute the conditional mutual information by maximizing each term in the chain rule :
For each term, we fix the cuts on X and Y optimized when computing I(Y;X,Z) and I(X;Y,Z) respectively ; but we re-initialize the cuts for each Zi and re-optimize them for every term of the sum.
Because of this, there are cases where I(X;Z) (red part) is larger than I(X;Y,Z) (red+green).
As you can see this is not supposed to happen, I suppose this is due to local maximums which are slightly more prone to be found when optimizing on |Z|+1 variables, as in I(X;Y,Z) vs |Z| (as in I(X;Z)).
Note that we can't just re-use the same cutpoints of Z for all terms : as an extreme example, imagine a case where I'(X;Z) = 0, implying 1 single bin for Zs, whereas I'(X;Y,Z) > I'(X;Y) > 0, i.e. some information is contained in the interaction between Y and Z, for which we need multi-bins discretization on Zs.
As a solution we can :
Try to avoid local maximas : although this is not fullproof, we should re-initialize Z with a number of bins that depends on |Z| to avoid the curse of dimensionality. For now it is N**(1/3), which we can change to e.g. N**(1/(2+|Z|)) or N**(1/3)/|Z|.
Guarantee that we respect I'(X;Y,Z) >= I'(X;Y) by setting either I(X:YZ) or Z bins to the values obtained when optimizing I(X;Z). Here we use external knowledge (red+green part is larger than red part) to fix optimizations that manifestly went wrong. The problem with this solution is that we don't fix cases that go the other way i.e. underestimate I'(X;Y). If we choose to set the same bins we can still re-do some optimizations on Zs and hope to increase I'(X;Y,Z).
We have observed cases where
I(X;Y|Z)<0
because of the way we perform the discretizatinos. It happens when we have some continuousZ
s andX
orY
(or both) discrete.We compute the conditional mutual information by maximizing each term in the chain rule :
https://github.com/miicTeam/miic_R_package/blob/661649d4bb7d6f7859bc0e54e22bba1d22515d26/src/computation_continuous.cpp#L732-L734
For each term, we fix the cuts on
X
andY
optimized when computingI(Y;X,Z)
andI(X;Y,Z)
respectively ; but we re-initialize the cuts for eachZi
and re-optimize them for every term of the sum.Because of this, there are cases where
I(X;Z)
(red part) is larger thanI(X;Y,Z)
(red+green).As you can see this is not supposed to happen, I suppose this is due to local maximums which are slightly more prone to be found when optimizing on
|Z|+1
variables, as inI(X;Y,Z)
vs|Z|
(as inI(X;Z)
).Note that we can't just re-use the same cutpoints of
Z
for all terms : as an extreme example, imagine a case whereI'(X;Z) = 0
, implying 1 single bin forZ
s, whereasI'(X;Y,Z) > I'(X;Y) > 0
, i.e. some information is contained in the interaction betweenY
andZ
, for which we need multi-bins discretization onZ
s.As a solution we can :
Z
with a number of bins that depends on|Z|
to avoid the curse of dimensionality. For now it isN**(1/3)
, which we can change to e.g.N**(1/(2+|Z|))
orN**(1/3)/|Z|
.I'(X;Y,Z) >= I'(X;Y)
by setting eitherI(X:YZ)
orZ
bins to the values obtained when optimizingI(X;Z)
. Here we use external knowledge (red+green part is larger than red part) to fix optimizations that manifestly went wrong. The problem with this solution is that we don't fix cases that go the other way i.e. underestimateI'(X;Y)
. If we choose to set the same bins we can still re-do some optimizations onZ
s and hope to increaseI'(X;Y,Z)
.