Negative CMI with continuous Zs

We have observed cases where I(X;Y|Z)<0 because of the way we perform the discretizatinos. It happens when we have some continuous Zs and X or Y (or both) discrete.

We compute the conditional mutual information by maximizing each term in the chain rule :

https://github.com/miicTeam/miic_R_package/blob/661649d4bb7d6f7859bc0e54e22bba1d22515d26/src/computation_continuous.cpp#L732-L734

For each term, we fix the cuts on X and Y optimized when computing I(Y;X,Z) and I(X;Y,Z) respectively ; but we re-initialize the cuts for each Zi and re-optimize them for every term of the sum.

Because of this, there are cases where I(X;Z) (red part) is larger than I(X;Y,Z) (red+green).

CMI

As you can see this is not supposed to happen, I suppose this is due to local maximums which are slightly more prone to be found when optimizing on |Z|+1 variables, as in I(X;Y,Z) vs |Z| (as in I(X;Z)).

Note that we can't just re-use the same cutpoints of Z for all terms : as an extreme example, imagine a case where I'(X;Z) = 0, implying 1 single bin for Zs, whereas I'(X;Y,Z) > I'(X;Y) > 0, i.e. some information is contained in the interaction between Y and Z, for which we need multi-bins discretization on Zs.

As a solution we can :

Try to avoid local maximas : although this is not fullproof, we should re-initialize Z with a number of bins that depends on |Z| to avoid the curse of dimensionality. For now it is N**(1/3), which we can change to e.g. N**(1/(2+|Z|)) or N**(1/3)/|Z|.
Guarantee that we respect I'(X;Y,Z) >= I'(X;Y) by setting either I(X:YZ) or Z bins to the values obtained when optimizing I(X;Z). Here we use external knowledge (red+green part is larger than red part) to fix optimizations that manifestly went wrong. The problem with this solution is that we don't fix cases that go the other way i.e. underestimate I'(X;Y). If we choose to set the same bins we can still re-do some optimizations on Zs and hope to increase I'(X;Y,Z).

miicTeam / miic_R_package

Negative CMI with continuous Zs #101