pwollstadt / IDTxl

The Information Dynamics Toolkit xl (IDTxl) is a comprehensive software package for efficient inference of networks and their node dynamics from multivariate time series data using information theory.
http://pwollstadt.github.io/IDTxl/
GNU General Public License v3.0
243 stars 77 forks source link

Inquiry about Transfer Entropy Results using MultivariateTE's analyse_single_target Method #97

Closed peanutnim closed 1 year ago

peanutnim commented 1 year ago

Dear IDXTL Research and Development Team,

I hope this email finds you well. I am writing to seek clarification on an issue I encountered while using the MultivariateTE library's "analyse_single_target" method to analyze the stock price volatility of several companies.

Initially, I imported the stock volatility data of 18 companies and ran the "analyse_single_target" method, which produced transfer entropy results for 8 of these companies with respect to the target company. Following this, I removed the data of two companies that did not have any transfer entropy to the target company and ran the analysis again. To my surprise, the transfer entropy results of the previous two companies disappeared from the output adjacency matrix. Instead, transfer entropy appeared for two different companies.

Could you please help me understand why this phenomenon occurred? Is there any explanation for the change in transfer entropy results after removing the data of the two companies without transfer entropy to the target company?

I would greatly appreciate any insights or guidance you can provide on this matter. Thank you in advance for your time and attention.

Looking forward to hearing from you.

jlizier commented 1 year ago

The multivariate TE seeks to identify the minimal set of parents that can best form a model for predicting the target's dynamic updates. If those two companies were included in the model, then they should have had (statistically significant) TE to the target - perhaps not pairwise TE at the first round, but conditional TE on the other selected nodes. So when you say that they did not have any transfer entropy, I'm assuming that's reported from the pairwise TE at the first round. When you remove them from the data set, of course they will not appear in the output adjacency matrix when you run the analysis again (I presume that's not what you're surprised about). The two new companies would be appearing because now when you build the model without the two companies you removed, inclusion of these two new companies are now able to significantly improve the (reduced) model. Presumably what they contribute is redundant with the removed companies so they were not able to be included when the former companies were available, because the former companies were a better choice and the new ones didn't add anything beyond them. (Or perhaps there was not enough statistical power to include all of them in the model). Hope that makes sense. I'm going to close this issue since it's not a bug, you can post to the google group if you need further explanation (or reopen if there is really a bug that I'm missing here)

peanutnim commented 1 year ago

The multivariate TE seeks to identify the minimal set of parents that can best form a model for predicting the target's dynamic updates. If those two companies were included in the model, then they should have had (statistically significant) TE to the target - perhaps not pairwise TE at the first round, but conditional TE on the other selected nodes. So when you say that they did not have any transfer entropy, I'm assuming that's reported from the pairwise TE at the first round. When you remove them from the data set, of course they will not appear in the output adjacency matrix when you run the analysis again (I presume that's not what you're surprised about). The two new companies would be appearing because now when you build the model without the two companies you removed, inclusion of these two new companies are now able to significantly improve the (reduced) model. Presumably what they contribute is redundant with the removed companies so they were not able to be included when the former companies were available, because the former companies were a better choice and the new ones didn't add anything beyond them. (Or perhaps there was not enough statistical power to include all of them in the model). Hope that makes sense. I'm going to close this issue since it's not a bug, you can post to the google group if you need further explanation (or reopen if there is really a bug that I'm missing here)

However,I removed the variables that did not have transfer entropy with the target variable, but in the second iteration of the adjacency matrix, the two companies that had transfer entropy with the target variable in the first round no longer had transfer entropy, and instead, two other companies that originally did not have transfer entropy suddenly had transfer entropy with the target company. Unfortunately, my request to join the Google group has not been approved yet.

mwibral commented 1 year ago

Hi all,

is it possible that the indices/interpretatiosn of which process is which became jumbled in the analyses? There may also be a misinterpretation of IDTxl's output format (??), with confusion about lag indices versus process indices.

If not, I have never seen such behaviour and can not easily think of a reason or a system where things should turn out like this.

Best, Michael

On Tue, 2023-04-25 at 05:44 -0700, peanutnim wrote:

However,I removed the variables that did not have transfer entropy with the target variable, but in the second iteration of the adjacency matrix, the two companies that had transfer entropy with the target variable in the first round no longer had transfer entropy, and instead, two other companies that originally did not have transfer entropy suddenly had transfer entropy with the target company. Unfortunately, my request to join the Google group has not been approved yet. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

peanutnim commented 1 year ago

Hi Michael, Here is my code, import pandas as pd import matplotlib.pyplot as plt from idtxl.multivariate_te import MultivariateTE from idtxl.data import Data from idtxl.visualise_graph import plot_network

df = pd.read_csv('/Users/yana/Desktop/idtxl1/2.csv') df['Date'] = pd.to_datetime(df['date']) df = df.set_index('Date') columns_to_calculate = df.columns[1:] data_matrix = df[columns_to_calculate].to_numpy() data = Data(data_matrix, dim_order='sp')

network_analysis = MultivariateTE() settings = {'cmi_estimator': 'JidtKraskovCMI', 'max_lag_sources': 5, 'min_lag_sources': 1}

results = network_analysis.analyse_single_target(settings=settings,data=data,target=0)

print(results.get_single_target(0,fdr=False))

results.print_edge_list(weights='max_te_lag', fdr=False) plot_network(results=results, weights='max_te_lag', fdr=False) plt.show()

I judged the transfer entropy between the source companies and the target company by looking at the directed graph and adjacency matrix. I think there's no problem with this. The same thing happened when I added more stock volatility sequences of companies to the dataset. After running the code, some of the transfer entropy from the original companies to the target company disappeared, and some did not. At the same time, some newly added companies or those that did not show transfer entropy to the target company initially reappeared with transfer entropy. I'm very confused about this. What do you think?

mwibral commented 1 year ago

Hi,

maybe I should have been more clear - could you provide BOTH versions of the code. The intial one, and the one producing the unexpected resuts ? Thanks.

Michael

On Tue, 2023-04-25 at 06:09 -0700, peanutnim wrote:

Hi Michael, Here is my code, import pandas as pd import matplotlib.pyplot as plt from idtxl.multivariate_te import MultivariateTE from idtxl.data import Data from idtxl.visualise_graph import plot_network df = pd.read_csv('/Users/yana/Desktop/idtxl1/2.csv') df['Date'] = pd.to_datetime(df['date']) df = df.set_index('Date') columns_to_calculate = df.columns[1:] data_matrix = df[columns_to_calculate].to_numpy() data = Data(data_matrix, dim_order='sp') network_analysis = MultivariateTE() settings = {'cmi_estimator': 'JidtKraskovCMI', 'max_lag_sources': 5, 'min_lag_sources': 1} results = network_analysis.analyse_single_target(settings=settings,data=data,ta rget=0) print(results.get_single_target(0,fdr=False)) results.print_edge_list(weights='max_te_lag', fdr=False) plot_network(results=results, weights='max_te_lag', fdr=False) plt.show() I judged the transfer entropy between the source companies and the target company by looking at the directed graph and adjacency matrix. I think there's no problem with this. The same thing happened when I added more stock volatility sequences of companies to the dataset. After running the code, some of the transfer entropy from the original companies to the target company disappeared, and some did not. At the same time, some newly added companies or those that did not show transfer entropy to the target company initially reappeared with transfer entropy. I'm very confused about this. What do you think? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

mwibral commented 1 year ago

One thing I saw is that you're using the graphs that are not corrected for multiple comparisons fdr=False. This may lead to false positives and unstable results. It maybe part pof your problem.

Michael

On Tue, 2023-04-25 at 06:09 -0700, peanutnim wrote:

Hi Michael, Here is my code, import pandas as pd import matplotlib.pyplot as plt from idtxl.multivariate_te import MultivariateTE from idtxl.data import Data from idtxl.visualise_graph import plot_network df = pd.read_csv('/Users/yana/Desktop/idtxl1/2.csv') df['Date'] = pd.to_datetime(df['date']) df = df.set_index('Date') columns_to_calculate = df.columns[1:] data_matrix = df[columns_to_calculate].to_numpy() data = Data(data_matrix, dim_order='sp') network_analysis = MultivariateTE() settings = {'cmi_estimator': 'JidtKraskovCMI', 'max_lag_sources': 5, 'min_lag_sources': 1} results = network_analysis.analyse_single_target(settings=settings,data=data,ta rget=0) print(results.get_single_target(0,fdr=False)) results.print_edge_list(weights='max_te_lag', fdr=False) plot_network(results=results, weights='max_te_lag', fdr=False) plt.show() I judged the transfer entropy between the source companies and the target company by looking at the directed graph and adjacency matrix. I think there's no problem with this. The same thing happened when I added more stock volatility sequences of companies to the dataset. After running the code, some of the transfer entropy from the original companies to the target company disappeared, and some did not. At the same time, some newly added companies or those that did not show transfer entropy to the target company initially reappeared with transfer entropy. I'm very confused about this. What do you think? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

peanutnim commented 1 year ago

Hi Michael, Thanks for your kind help. I'm sorry to bother you again. I used the same code twice, only changing the data without altering the code, and encountered an error when setting fdr=True for the graphs. here is the error:

Traceback (most recent call last):   File "/Users/yana/IDTxl/idtxl/results.py", line 463, in get_single_target     return self._single_target_fdr[target]            ~~~~~~~^^^^^^^^ KeyError: 0   During handling of the above exception, another exception occurred:   Traceback (most recent call last):   File "/Users/yana/Desktop/idtxl1/zailaiyibian.py", line 28, in     print(results.get_single_target(0,fdr=True))           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "/Users/yana/IDTxl/idtxl/results.py", line 469, in get_single_target     raise RuntimeError( RuntimeError: No FDR-corrected results for target 0. Set fdr=False to see uncorrected results.

I think the error may be related to the fdr setting, but I am unsure how to properly set it for the analyse_single_target method of MultivariateTE.

Thank you for your help, your assistance is greatly appreciated.

mwibral commented 1 year ago

Hi,

this 'error'basically meams that no significant TE was detected after crrecting for multiple comparisons.

This still doesn't fully explain the behaviour of your code, but it is certainly a sign to treat your results with great caution.

Re the hoppg non-correcting significances: Are you sure that changing the data and interpreting the indices went allright? (Just checking)

Best, Michael

On Tue, 2023-04-25 at 21:44 -0700, peanutnim wrote:

Hi Michael, Thanks for your kind help. I'm sorry to bother you again. I used the same code twice, only changing the data without altering the code, and encountered an error when setting fdr=True for the graphs. here is the error: Traceback (most recent call last):   File "/Users/yana/IDTxl/idtxl/results.py", line 463, in get_single_target     return self._single_target_fdr[target]            ~~~~~~~^^^^^^^^ KeyError: 0   During handling of the above exception, another exception occurred:   Traceback (most recent call last):   File "/Users/yana/Desktop/idtxl1/zailaiyibian.py", line 28, in     print(results.get_single_target(0,fdr=True))           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "/Users/yana/IDTxl/idtxl/results.py", line 469, in get_single_target     raise RuntimeError( RuntimeError: No FDR-corrected results for target 0. Set fdr=False to see uncorrected results. I think the error may be related to the fdr setting, but I am unsure how to properly set it for the analyse_single_target method of MultivariateTE. Thank you for your help, your assistance is greatly appreciated. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

peanutnim commented 1 year ago

I deeply appreciate your help. I will recheck my data and indices. I make sure your amazing work would help a lot in my own research.

If I encounter similar issues in the future, I will consult with you again. Thank you for your assistance and guidance!

peanutnim commented 1 year ago

Hi,Michael I'm sorry to bother you again.I have converted my data into a two-dimensional numpy array, and I am unsure how to incorporate the concept of replication in this context. Therefore, I am uncertain about the appropriate way to add replication.

And during my analysis, I encountered the following warning message and would appreciate your assistance in understanding its meaning and addressing related concerns:

WARNING: Number of replications is not sufficient to generate the desired number of surrogates. Permuting samples in time instead. maximum statistic, n_perm: 200

Could you please provide clarification regarding the meaning and implications of this warning message? What does it indicate when the number of replications is considered insufficient? How does idtxl handle this situation by permuting samples in time?

Additionally, I have been searching for documentation or information pertaining to the replication parameter in idtxl, but have been unable to find any specific details. Could you kindly provide information on how to set and adjust the replication parameter? What is the default value, and how does it impact the analysis?

After encountering the aforementioned warning message, does idtxl automatically adjust any settings or parameters? If so, could you please explain the automatic adjustments that occur after encountering this warning?

I am grateful for your time and support in addressing these concerns. As a student utilizing idtxl, your guidance would greatly contribute to my research analysis. Thank you for your dedication in developing and maintaining the idtxl.

Best , Pitkin