mmschlk / shapiq

Shapley Interactions for Machine Learning
https://shapiq.readthedocs.io
MIT License
60 stars 6 forks source link

Fix Unbiased Data Game without pyitlib #137

Closed mmschlk closed 2 months ago

mmschlk commented 2 months ago

pyitlib is out of date and does not seem to be supported well. It does not resolve with current sklearn libraries in an environment. Hence we need to think of another way of computing these functions with scipy.

def total_correlation(data) -> float:
    """Compute the total correlation of a data subset.

    The computation computes the total correlation C of a set of random variables X_1,...,X_n such
    that C(X_1,...,X_n) = H(X_1) + ... + H(X_n) - H(X_1,...,X_n). For more information see:
    https://arxiv.org/pdf/2205.09060.pdf

    Args:
        data: The data subset as a numpy array of shape (n_samples, n_features).

    Returns:
        The total correlation of the data subset.

    Note:
        This function requires the pyitlib package to be installed. You can install it via pip:
    pip install pyitlib
    ```
"""
from pyitlib import discrete_random_variable as drv

return drv.information_multi(data)

def entropy(data): """Compute the Shannon entropy H of a set of random variables X_1,...,X_n.

Args:
    data: The data subset as a numpy array of shape (n_samples, n_features).

Returns:
    The Shannon entropy of the data subset.

Note:
    This function requires the pyitlib package to be installed. You can install it via pip:
    ```
    pip install pyitlib
    ```
"""
from pyitlib import discrete_random_variable as drv

return drv.entropy_joint(data)