pwollstadt / IDTxl

The Information Dynamics Toolkit xl (IDTxl) is a comprehensive software package for efficient inference of networks and their node dynamics from multivariate time series data using information theory.
http://pwollstadt.github.io/IDTxl/
GNU General Public License v3.0
243 stars 77 forks source link

Problem acting on neo spike train data using IDTxI #13

Closed russelljjarvis closed 5 years ago

russelljjarvis commented 6 years ago

It just occurred to me that if I pickle the time binned spike trains (small data footprint) and them to the repository that could speed up reproduction and make many steps below superfluous.

Full reproduction:

git clone https://github.com/russelljjarvis/HippNetTE.git

https://github.com/russelljjarvis/HippNetTE/blob/master/Dockerfile

sudo docker build -t te .

Define BASH alias in ~/.bashrc or improvise something to this effect.

alias fax='cd ~/git/DAnalysis; sudo docker run -it -v ~/git/DAnalysis:/home/jovyan/QIASCOLI -v /tmp/.X11-unix:/tmp/.X11-unix --env="DISPLAY" te /bin/bash'

Launch docker environment:

source ~/.bashrc
fax

Inside Docker container: Generate the pickled neo files (by running forked.py) they are too big to add to GH


Finally run the multivariate transfer entropy on the pickled neo files + pre data wrangling.
python3 sate.py

Adding data with properties: 1 processes, 122 samples, 400 replications overwriting existing data <idtxl.data.Data object at 0x7ff00cba0cc0>

####### analysing target with index 0 from list [0] PyOpenCl is not available on this system. Install it using pip or the package manager to use OpenCL-powered CMI estimation. File "/opt/conda/lib/python3.5/site-packages/idtxl/estimators_opencl.py", line 7, in import pyopencl as cl

Traceback (most recent call last): File "sate.py", line 55, in te(mdfl) File "sate.py", line 46, in te res_full = mte.analyse_network(settings=settings, data=dat) File "/opt/conda/lib/python3.5/site-packages/idtxl/multivariate_te.py", line 163, in analyse_network sources[t]) File "/opt/conda/lib/python3.5/site-packages/idtxl/multivariate_te.py", line 273, in analyse_single_target self._initialise(settings, data, sources, target) File "/opt/conda/lib/python3.5/site-packages/idtxl/network_inference.py", line 51, in _initialise self._cmi_estimator = EstimatorClass(settings) TypeError: Can't instantiate abstract class JidtKraskov with abstract methods estimate



I have also updated the other GH issue that pertains more to the docker build.
russelljjarvis commented 6 years ago

I just updated the repository to include a binary_trains.p pickle file. The code below could be a minimal working example for recreating the error:

    import numpy as np
    from idtxl.multivariate_te import MultivariateTE
    from idtxl.data import Data
    import pickle

    n_procs = 1
    settings = {
        'cmi_estimator': 'JidtDiscreteCMI',
        'n_perm_max_stat': 21,
        'max_lag_target': 5,
        'max_lag_sources': 5,
        'min_lag_sources': 4}
    settings['cmi_estimator'] = 'JidtDiscreteCMI'
    with open('binary_trains.p','rb') as f:
        binary_trains = pickle.load(f)

    dat = Data(np.array(binary_trains), dim_order='spr')

    dat.n_procs = n_procs
    settings = {'cmi_estimator': 'JidtKraskov',
            'max_lag_sources': 3,
            'max_lag_target': 3,
            'min_lag_sources': 1}
    print(dat)
    mte = MultivariateTE()
    res_full = mte.analyse_network(settings=settings, data=dat)

    # generate graph plots
    g_single = visualise_graph.plot_selected_vars(res_single, mte)
    g_full = visualise_graph.plot_network(res_full)
pwollstadt commented 6 years ago

I just had a look at your script, thanks for posting a mwe. Some things I noticed:

Here is a modified version of your script:

import numpy as np
from idtxl.multivariate_te import MultivariateTE
from idtxl.data import Data
import pickle

with open('HippNetTE/binary_trains.p', 'rb') as f:
    binary_trains = pickle.load(f)

I had to replace a few non-binary entries in your data set. If you don't want to do that, you have to change 'n_discrete_bins' in your settings dict (the default is 2 for binary data).

spikes = np.array(binary_trains)
spikes[spikes == 2] = 1

I changed the ordering of your axes by setting dim_order='prs', i.e., IDTxl will interpret the first axis as processes, the second as replications, and the third as samples. This is just to illustrate how to use the multivariate TE algorithm, because in your example you had only one process, which wouldn't allow us to estimate any TE. Note that I set normalise to False and I changed the estimator name. I only analyze one target (targets=[0]) to make it faster.

dat = Data(spikes, dim_order='prs', normalise=False)
settings = {
    'cmi_estimator': 'JidtDiscreteCMI',
    'n_perm_max_stat': 21,
    'max_lag_target': 5,
    'max_lag_sources': 5,
    'min_lag_sources': 4}
mte = MultivariateTE()
res_full = mte.analyse_network(settings=settings, data=dat, targets=[0])

Hope this helps. Using IDTxl to analyze discrete data is still a bit tricky, my apologies. We are working on more documentation and example scripts to make this easier. Let me know if it works.

russelljjarvis commented 6 years ago

I recently tried again.

Code is as follows:

import numpy as np
from idtxl.multivariate_te import MultivariateTE
from idtxl.data import Data
import pickle

with open('binary_trains.p', 'rb') as f:
    binary_trains = pickle.load(f)
spikes = np.array(binary_trains)
spikes[spikes == 2] = 1

dat = Data(spikes, dim_order='prs', normalise=False)
settings = {
    'cmi_estimator': 'JidtDiscreteCMI',
    'n_perm_max_stat': 21,
    'max_lag_target': 5,
    'max_lag_sources': 5,
    'min_lag_sources': 4}
mte = MultivariateTE()
res_full = mte.analyse_network(settings=settings, data=dat, targets=[0])

It errors like this:

Adding data with properties: 122 processes, 400 samples, 1 replications
overwriting existing data

####### analysing target with index 0 from list [0]
Testing sources [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121]

---------------------------- (1) include target candidates
> /opt/conda/lib/python3.5/site-packages/idtxl/estimators_jidt.py(701)estimate()
-> calc = self.CalcClass(max_base, self.settings['lag'])
(Pdb) max_base
2
(Pdb) self
<idtxl.estimators_jidt.JidtDiscreteMI object at 0x7f785fc2db38>
(Pdb) type(self)
<class 'idtxl.estimators_jidt.JidtDiscreteMI'>
(Pdb) c
Traceback (most recent call last):
  File "sate2.py", line 21, in <module>
    res_full = mte.analyse_network(settings=settings, data=dat, targets=[0])
  File "/opt/conda/lib/python3.5/site-packages/idtxl/multivariate_te.py", line 163, in analyse_network
    sources[t])
  File "/opt/conda/lib/python3.5/site-packages/idtxl/multivariate_te.py", line 277, in analyse_single_target
    self._include_target_candidates(data)
  File "/opt/conda/lib/python3.5/site-packages/idtxl/network_inference.py", line 196, in _include_target_candidates
    sources_found = self._include_candidates(candidates, data)
  File "/opt/conda/lib/python3.5/site-packages/idtxl/network_inference.py", line 253, in _include_candidates
    conditional=self._selected_vars_realisations)
  File "/opt/conda/lib/python3.5/site-packages/idtxl/estimator.py", line 301, in estimate_mult
    res[i] = self.estimate(**chunk_data)
  File "/opt/conda/lib/python3.5/site-packages/idtxl/estimators_jidt.py", line 531, in estimate
    return est.estimate(var1, var2, return_calc)
  File "/opt/conda/lib/python3.5/site-packages/idtxl/estimators_jidt.py", line 701, in estimate
    calc = self.CalcClass(max_base, self.settings['lag'])
  File "/opt/conda/lib/python3.5/site-packages/jpype/_jpackage.py", line 62, in __call__
    raise TypeError("Package {0} is not Callable".format(self.__name))
TypeError: Package infodynamics.measures.discrete.MutualInformationCalculatorDiscrete is not Callable
mwibral commented 6 years ago

Hi all,

I'm not sure this still applies to current versions of the code, but it used to be necessary to explicitely indicate the numeric type of the numpy array as being integer (e.g. instead of float of boolean, or whatever numpy would default to here).

Michael

On 15.05.2018 05:18, Russell Jarvis wrote:

Alright I had time to retry. I run the code below:

import numpyas np from idtxl.multivariate_teimport MultivariateTE from idtxl.dataimport Data import pickle

with open('binary_trains.p','rb')as f: binary_trains= pickle.load(f) spikes= np.array(binary_trains) spikes[spikes== 2]= 1

dat= Data(spikes,dim_order='prs',normalise=False) settings= { 'cmi_estimator':'JidtDiscreteCMI', 'n_perm_max_stat':21, 'max_lag_target':5, 'max_lag_sources':5, 'min_lag_sources':4} mte= MultivariateTE() res_full= mte.analyse_network(settings=settings,data=dat,targets=[0])

And I get the following error messages:

jovyan@2b26b72b4411:~/QIASCOLI$ ipython sate2.py Adding datawith properties:122 processes,400 samples,1 replications overwriting existing data

####### analysing target with index 0 from list [0] Testing sources [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121]

---------------------------- (1) include target candidates

TypeError Traceback (most recent call last) /home/jovyan/QIASCOLI/sate2.pyin () 19 'min_lag_sources':4} 20 mte= MultivariateTE() ---> 21 res_full= mte.analyse_network(settings=settings,data=dat,targets=[0])

/opt/conda/lib/python3.5/site-packages/idtxl/multivariate_te.pyin analyse_network(self, settings, data, targets, sources) 161 data, 162 targets[t], --> 163 sources[t]) 164
165 # Perform FDR-correction on the network level. Add FDR-corrected

/opt/conda/lib/python3.5/site-packages/idtxl/multivariate_te.pyin analyse_single_target(self, settings, data, target, sources) 275 # Main algorithm. 276 print('\n---------------------------- (1) include target candidates') --> 277 self._include_target_candidates(data) 278 print('\n---------------------------- (2) include source candidates') 279 self._include_source_candidates(data)

/opt/conda/lib/python3.5/site-packages/idtxl/network_inference.pyin _include_target_candidates(self, data) 194 -self.settings['tau_target']).tolist() 195 candidates= self._define_candidates(procs, samples) --> 196 sources_found= self._include_candidates(candidates, data) 197
198 # If no candidates were found in the target's past, add at least one

/opt/conda/lib/python3.5/site-packages/idtxl/network_inference.pyin _include_candidates(self, candidate_set, data) 251 var1=cand_real, 252 var2=self._current_value_realisations, --> 253 conditional=self._selected_vars_realisations) 254
255 # Test max CMI for significance with maximum statistics.

/opt/conda/lib/python3.5/site-packages/idtxl/estimator.pyin estimate_mult(self, n_chunks, re_use,data) 299 for vin re_use: 300 chunk_data[v]= data[v] --> 301 res[i]= self.estimate(chunk_data) 302 idx_1= idx_2 303 idx_2+= chunk_size

/opt/conda/lib/python3.5/site-packages/idtxl/estimators_jidt.pyin estimate(self, var1, var2, conditional, return_calc) 529 # Return value will be just the estimate if return_calc is False, 530 # or estimate plus the JIDT MI calculator if return_calc is True: --> 531 return est.estimate(var1, var2, return_calc) 532 else: 533 assert(conditional.size!= 0),'Conditional Array is empty.'

/opt/conda/lib/python3.5/site-packages/idtxl/estimators_jidt.pyin estimate(self, var1, var2, return_calc) 697 max_base= int(max(np.power(self.settings['alph1'], var1_dim), 698 np.power(self.settings['alph2'], var2_dim))) --> 699 calc= self.CalcClass(max_base,self.settings['lag']) 700 calc.setDebug(self.settings['debug']) 701 calc.initialise()

/opt/conda/lib/python3.5/site-packages/jpype/_jpackage.pyin call(self,*arg,*kwarg) 60
61 def call(self,
arg,**kwarg): ---> 62 raise TypeError("Package {0} is not Callable".format(self.__name))

TypeError: Package infodynamics.measures.discrete.MutualInformationCalculatorDiscreteis not Callable

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pwollstadt/IDTxl/issues/13#issuecomment-389030217, or mute the thread https://github.com/notifications/unsubscribe-auth/AIqYGvOcf0JjcCxtuvoHP77q45wi358uks5tykkUgaJpZM4Tgvwy.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/pwollstadt/IDTxl","title":"pwollstadt/IDTxl","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/pwollstadt/IDTxl"}},"updates":{"snippets":[{"icon":"PERSON","message":"@russelljjarvis in #13: Alright I had time to retry. I run the code below:\r\npython\r\nimport numpy as np\r\nfrom idtxl.multivariate_te import MultivariateTE\r\nfrom idtxl.data import Data\r\nimport pickle\r\n\r\nwith open('binary_trains.p', 'rb') as f:\r\n binary_trains = pickle.load(f)\r\nspikes = np.array(binary_trains)\r\nspikes[spikes == 2] = 1\r\n\r\ndat = Data(spikes, dim_order='prs', normalise=False)\r\nsettings = {\r\n 'cmi_estimator': 'JidtDiscreteCMI',\r\n 'n_perm_max_stat': 21,\r\n 'max_lag_target': 5,\r\n 'max_lag_sources': 5,\r\n 'min_lag_sources': 4}\r\nmte = MultivariateTE()\r\nres_full = mte.analyse_network(settings=settings, data=dat, targets=[0])\r\n\r\n\r\nAnd I get the following error messages:\r\n```python\r\njovyan@2b26b72b4411:~/QIASCOLI$ ipython sate2.py\r\nAdding data with properties: 122 processes, 400 samples, 1 replications\r\noverwriting existing data\r\n\r\n####### analysing target with index 0 from list [0]\r\nTesting sources [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121]\r\n\r\n---------------------------- (1) include target candidates\r\n---------------------------------------------------------------------------\r\nTypeError Traceback (most recent call last)\r\n/home/jovyan/QIASCOLI/sate2.py in \u003cmodule\u003e()\r\n 19 'min_lag_sources': 4}\r\n 20 mte = MultivariateTE()\r\n---\u003e 21 res_full = mte.analyse_network(settings=settings, data=dat, targets=[0])\r\n\r\n/opt/conda/lib/python3.5/site-packages/idtxl/multivariate_te.py in analyse_network(self, settings, data, targets, sources)\r\n 161 data,\r\n 162 targets[t],\r\n--\u003e 163 sources[t])\r\n 164 \r\n 165

Perform FDR-correction on the network level. Add

FDR-corrected\r\n\r\n/opt/conda/lib/python3.5/site-packages/idtxl/multivariate_te.py in analyse_single_target(self, settings, data, target, sources)\r\n 275 # Main algorithm.\r\n 276 print('\n---------------------------- (1) include target candidates')\r\n--\u003e 277 self._include_target_candidates(data)\r\n 278 print('\n---------------------------- (2) include source candidates')\r\n 279 self._include_source_candidates(data)\r\n\r\n/opt/conda/lib/python3.5/site-packages/idtxl/network_inference.py in _include_target_candidates(self, data)\r\n 194 -self.settings['tau_target']).tolist()\r\n 195 candidates = self._define_candidates(procs, samples)\r\n--\u003e 196 sources_found = self._include_candidates(candidates, data)\r\n 197 \r\n 198 # If no candidates were found in the target's past, add at least one\r\n\r\n/opt/conda/lib/python3.5/site-packages/idtxl/network_inference.py in _include_candidates(self, candidate_set, data)\r\n 251 var1=cand_real,\r\n 252 var2=self._current_value_realisations,\r\n--\u003e 253 conditional=self._selected_vars_realisations)\r\n 254 \r\n 255 # Test max CMI for significance with maximum statistics.\r\n\r\n/opt/conda/lib/python3.5/site-packages/idtxl/estimator.py in estimate_mult(self, n_chunks, re_use, data)\r\n 299 for v in re_use:\r\n 300 chunk_data[v] = data[v]\r\n--\u003e 301 res[i] = self.estimate(chunk_data)\r\n 302 idx_1 = idx_2\r\n 303 idx_2 += chunk_size\r\n\r\n/opt/conda/lib/python3.5/site-packages/idtxl/estimators_jidt.py in estimate(self, var1, var2, conditional, return_calc)\r\n 529 # Return value will be just the estimate if return_calc is False,\r\n 530 # or estimate plus the JIDT MI calculator if return_calc is True:\r\n--\u003e 531 return est.estimate(var1, var2, return_calc)\r\n 532 else:\r\n 533 assert(conditional.size != 0), 'Conditional Array is empty.'\r\n\r\n/opt/conda/lib/python3.5/site-packages/idtxl/estimators_jidt.py in estimate(self, var1, var2, return_calc)\r\n 697 max_base = int(max(np.power(self.settings['alph1'], var1_dim),\r\n 698 np.power(self.settings['alph2'], var2_dim)))\r\n--\u003e 699 calc = self.CalcClass(max_base, self.settings['lag'])\r\n 700 calc.setDebug(self.settings['debug'])\r\n 701 calc.initialise()\r\n\r\n/opt/conda/lib/python3.5/site-packages/jpype/_jpackage.py in call(self, *arg, *kwarg)\r\n 60 \r\n 61 def call(self, arg, kwarg):\r\n---\u003e 62 raise TypeError(\"Package {0} is not Callable\".format(self.__name))\r\n\r\nTypeError: Package infodynamics.measures.discrete.MutualInformationCalculatorDiscrete is not Callable\r\n```"}],"action":{"name":"View Issue","url":"https://github.com/pwollstadt/IDTxl/issues/13#issuecomment-389030217"}}}{"@type":"MessageCard","@context":"http://schema.org/extensions","hideOriginalBody":"false","originator":"37567f93-e2a7-4e2a-ad37-a9160fc62647","title":"Re: [pwollstadt/IDTxl] Problem acting on neo spike train data using IDTxI (#13)","sections":[{"text":"","activityTitle":"Russell Jarvis**","activityImage":"https://avatars0.githubusercontent.com/u/7786645?s=160\u0026v=4","activitySubtitle":"@russelljjarvis","facts":[]}],"potentialAction":[{"name":"Add a comment","@type":"ActionCard","inputs":[{"isMultiLine":true,"@type":"TextInput","id":"IssueComment","isRequired":false}],"actions":[{"name":"Comment","@type":"HttpPOST","target":"https://api.github.com","body":"{\"commandName\":\"IssueComment\",\"repositoryFullName\":\"pwollstadt/IDTxl\",\"issueId\":13,\"IssueComment\":\"{{IssueComment.value}}\"}"}]},{"name":"Close issue","@type":"HttpPOST","target":"https://api.github.com","body":"{\"commandName\":\"IssueClose\",\"repositoryFullName\":\"pwollstadt/IDTxl\",\"issueId\":13}"},{"targets":[{"os":"default","uri":"https://github.com/pwollstadt/IDTxl/issues/13#issuecomment-389030217"}],"@type":"OpenUri","name":"View on GitHub"},{"name":"Unsubscribe","@type":"HttpPOST","target":"https://api.github.com","body":"{\"commandName\":\"MuteNotification\",\"threadId\":327351346}"}],"themeColor":"26292E"}

--

Prof. Dr. rer. nat. Michael Wibral MEG Labor, Brain Imaging Center Goethe Universität

Heinrich Hoffmann Strasse 10 60528 Frankfurt am Main

Phone: +49 69 6301 83193 Fax: +49 69 6301 83231

pwollstadt commented 6 years ago

As @mwibral pointed out, data have to be in the appropriate format which is int for the discrete estimator. @russelljjarvis , I checked that and your data seem to be saved as integers in binary_trains.p and is also imported as int. So this shouldn't be the problem (also, the discrete estimator checks if data are passed as integers and throws a different error if are not in an appropriate format). Just to make sure, you can check the data type by calling dat.data_type. The JIDT estimator usually complains about "no matching overloads found" if input is incorrect (wrong number or type of variables). To me your error sounds like some problem with JPype. Can you check if you can call a JIDT estimator directly:

import numpy as np
from idtxl.estimators_jidt import JidtDiscreteMI

# Generate random test data
n = 1000
var1 = np.random.normal(0, 1, size=n)
var2 = np.random.normal(0, 1, size=n)  

settings = {'discretise_method': 'equal', 'num_discrete_bins': 5}
est = JidtDiscreteMI(settings)
cmi = est.estimate(var1, var2)
print('Estimated CMI: {0:.5f}'.format(cmi))

If this throws a similar error, make sure you have the correct version of JPype installed. Another thing you could do is to make sure you downloaded the latest version of the repo. You can try to use the current develop branch of the toolbox. We used a newer Java version for some development but went back to version 1.6 in the last commit to the develop branch. Maybe a wrong java version is causing the error (see also this issue, which is resolved in the current development version of the toolbox). Let me know how it goes!

pwollstadt commented 5 years ago

Hi all, I am closing this issue because it has been inactive for a few months. Please reopen if you still encounter this issue with the latest stable version. Thanks!