Why is PID not permutationally-invariant?

aleksejs-fomins commented 3 years ago

Dear IDTxl developers,

I have been trying to evaluate triplet PID for my data, and found that at least the Tartu estimator is non-invariant under some permutations of the 3 variables. I would naively expect that a) Redundancy of 3 variables is the same for all 6 permutations. b) Synergy of 3 variables is the same for all 6 permutations. c) Unique info is invariant under exchange of the sources. d) Unique info is invariant under exchange of first source and target

A small test shows that out of those only c) holds, but a), b) and d) don't hold.

Could somebody please help me understand this better:

Is this the expected behaviour?
From the point of view of synergy or redundancy, is a target somehow special compared to a source?
What is the difference between unique info and conditional mutual information (in the context of 3 variables).

Here is a minimal example

import numpy as np
import pandas as pd
from itertools import permutations

from idtxl.bivariate_pid import BivariatePID
from idtxl.data import Data

def bivariate_pid_3D(data):
    settings = {
        'settings_estimator': {'pid_estimator': 'TartuPID', 'lags_pid': [0, 0]},
        'src': [0, 1],
        'trg': 2
    }

    dataIDTxl = Data(data, dim_order='rps', normalise=False)
    pid = BivariatePID()

    rez = pid.analyse_single_target(settings=settings['settings_estimator'], data=dataIDTxl, target=settings['trg'], sources=settings['src'])

    return np.array([
        rez.get_single_target(settings['trg'])['unq_s1'],
        rez.get_single_target(settings['trg'])['unq_s2'],
        rez.get_single_target(settings['trg'])['shd_s1_s2'],
        rez.get_single_target(settings['trg'])['syn_s1_s2']
    ])

nChannel = 3
nTrial = 20
nTime = 100

dataRPS = np.random.randint(0, 4, (nTrial, nChannel, nTime)).astype(int)

####################################
# Test 1: Consistency across permutations
####################################

rez = []
pLst = list(permutations([0,1,2]))
for p in pLst:
    rez += [[p] + list(bivariate_pid_3D(dataRPS[:, p]))]

print(pd.DataFrame(rez, columns=['perm', 'U1', 'U2', 'Red', 'Syn']))

Abzinger commented 3 years ago

Dear Aleksejs,

First to make sure that we are on the same page. By permutation, you mean permuting the sources and target. To fix the notation, I will refer to the original triplet as T, S_1, and S_2 where T is the target and S_2 and S_2 are the sources

a,b and d don't hold for any PID estimation

In general for any PID measure a, b, d don't hold (I will get back to c later)

The short answer is that the target in PID is special compared to sources (the symmetry is not generic). To explain this further,

PID is the decomposition of the information that sources jointly have about the target. Therefore exchanging the target by any of the sources should in general yield a different decomposition. For instance,

RI(T: S_1, S_2) is the information that can be acquired about T by either accessing S_1 or S_2
RI(S_1: T, S_2) is the information that can be acquired about S_1 by either accessing T or S_2

RI(T: S_1, S_2) and RI(S_1: T, S_2) are conceptually different. The same holds for all the other PID terms. In addition, MI(T: S_1, S_2) is in general not equal to MI(S_1: T, S_2) so why should their PID decompositions be the same.

c) Unique info is invariant under exchange of the sources.

Regarding point (c). Did you mean that you exchanged S_1 and S_2, say the new triplet (T, X_1, X_2) where X_1 = S_2 and X_2 = S_1 and based on such exchange you arrived at UI(T: S_1\S_2) = UI(T: X_1\X_2)?

This shouldn't hold in general unless UI(T: S_1\ S_2) = UI(T: S_2\S_1) in the first place. But then (c) trivially holds. (I probably misunderstood what you meant here)

I think that what I have discussed so far should answer your first two questions.

Difference between unique information and conditional mutual information

UI(T: S_1 \ S_2) quantifies the information that S_1 uniquely has about T, i.e. the information that can be acquired about T if and only if S_1 is accessible. To get more intuition, let's dissect the sufficient and necessary conditions.

Sufficient condition: If S_1 is accessible what is the information that can be acquired about T, that is all the information that S_1 has about T, i.e. MI(T: S_1)
Necessary condition is what grants the unique nature of this information since the condition constraints that if S_1 is not accessible then this information about T can't be acquired. Thereby giving exclusivity for S_1 to this information.

MI(T: S_1 | S_2) quantifies the information about T that accessing S_1 offers on top of what can be offered by accessing S_2. In other words, this is the information about T that can be acquired by requiring access to S_1.

From this, MI(T: S_1 | S_2 ) is composed of two types of PID-information

information that is unique to S_1 about T, i.e. UI(T: S_1\ S_2)
the information that you can synergistically get about T since you can never get this information if you can't access S_1, i.e. Syn(T: S_1, S_2)

Therefore, MI(T: S_1 | S_2) = UI(T: S_1\ S_2) + Syn(T: S_1, S_2). NOte that the shared (redundant) information that S_1 and S_2 has about T is not part of MI(T: S_1 | S_2) since acquiring this information about T can be without accessing S_1.

Futher reading

I recommend you to read:

2017 Pica et al. Invariant Components of Synergy, Redundancy, and Unique Information among Three Variables: they discuss exactly the same question that you have posed and suggest a finer decomposition than PID that is invariant under permutations
2020 Gutknecht et al. Bits and Pieces: Understanding Information Decomposition from Part-whole Relationships and Formal Logic: they explain PID in a very intuitive way by deriving from Part-whole relationships (mereology). In addition, the discussion subsection B1. addresses question 3 in more detail.

aleksejs-fomins commented 3 years ago

Dear Abdullah, Thanks so much for your extensive answer. It seems that I have fundamentally misunderstood PID to some extent. Honestly, after reading your explanations, I have understood what I am wrong about, but I still struggle to completely grasp on the meaning of e.g. redundancy. I will have a look at the papers you have suggested.

Abzinger commented 3 years ago

Dear Alexksejs,

Welcome! In fact, it is normal that it takes you time to get to understand deeply the concept of PID. I think the Gutknecht paper is gonna be a game-changer for you since it rebuilds PID from fundamental (easy to grasp) principles.

Cheers

pwollstadt commented 3 years ago

Hi everyone, I will close this. Please feel free to reopen if there is an issue (please also consider using the mailing list for more conceptual questions on the measures implemented in IDTxl).

pwollstadt / IDTxl