pwollstadt / IDTxl

The Information Dynamics Toolkit xl (IDTxl) is a comprehensive software package for efficient inference of networks and their node dynamics from multivariate time series data using information theory.
http://pwollstadt.github.io/IDTxl/
GNU General Public License v3.0
249 stars 76 forks source link

MultivariateTE.network_analysis.analyse_single_target() sometimes produces NaNs #34

Closed SantoshManicka closed 3 years ago

SantoshManicka commented 5 years ago

I got the following error message when I ran MultivariateTE.network_analysis.analyse_single_target() on my data for a particular target process: "~/lib/python3.6/site-packages/idtxl/stats.py:1408: RuntimeWarning: invalid value encountered in greater_equal pvalue = sum(distribution >= statistic) / distribution.shape[0]"

This resulted in 'selected_sources_te' containing NaNs throughout.

If it's not clear what may have caused the above error, I'll try to post my data here.

pwollstadt commented 5 years ago

Hi @SantoshManicka, what kind of data are you trying to analyze? I agree, would be good to have a look. Let me know how to access your data (you can also write me an email). Thanks!

jlizier commented 5 years ago

@SantoshManicka can you perhaps also send the script you're using to start the analysis so we can see all of the properties; in particular, is this using the linear-Gaussian estimator? My current gut feel is that perhaps some of the processes are linearly redundant (which may involve one source having constant value for all samples as a possibility)

SantoshManicka commented 5 years ago

@jlizier you are right -- my dataset has a set of "input" nodes that have constant states. And yes, I used 'JidtGaussianCMI' estimator. However, this error doesn't crop up for all nodes, even for those that received significant information from an input node. I'm not sure if I still have the dataset (I'll check), but these are the settings I used:

settings = {'cmi_estimator': 'JidtGaussianCMI',
            'n_perm_max_stat': 5000,
            'n_perm_min_stat': 5000,
            'max_lag_sources': 100,
            'min_lag_sources': 1}
jlizier commented 5 years ago

Ok, that's great, it seems the constant states are the problem. This is a known issue for that particular estimator; I have been working on a solution recently and will have it uploaded soon (basically the answers will be hard coded to zeros for processes with no variance). In the interim, the suggested solution is to add a small amount of noise to each time series (say on the scale of 10e-6 times the std dev of the data, or if std dev is zero then at 10e-6 scale). Can you try that and let us know if all is ok afterwards?

pwollstadt commented 3 years ago

Hi all, I will close this issue since there has not been any activity recently. Feel free to open it again if the problem persists.