Closed aleksejs-fomins closed 3 years ago
Just to check that we are on the same page: dataNumpy = np.random.randint(0, 4, (3967, 27, 1)) data = Data(dataNumpy, 'rps', normalise=False)
'rsp' means that you have 3967 replications, 27 processes and 1 sample. I'm not sure if you meant to have one sample per replication.
Could it be the _check_input is complaining that this is a single point in the samples? Not sure just spit balling ;)
P.S. I tried your script and it worked. See below the print out (I replaced the last line by print(results_SxPID.get_single_target(3)['avg'])'
)
Adding data with properties: 27 processes, 1 samples, 3967 replications overwriting existing data {((1,),): (0.2476656862520551, 0.24607925544106224, 0.0015864308109929143), ((2,),): (0.24793261350356333, 0.24779678052787935, 0.00013583297568396408), ((3,\ ),): (0.247689225543444, 0.2462293130099496, 0.001459912533494354), ((1, 2),): (0.29220754752122313, 0.28964287621002227, 0.002564671311200752), ((1, 3),): (\ 0.29209258144036937, 0.2891251698786944, 0.0029674115616751106), ((2, 3),): (0.2917557182351069, 0.2888069078560238, 0.0029488103790827687), ((1, 2, 3),): (0\ .8932421388524286, 0.874174272951998, 0.01906786590043029), ((1,), (2,)): (0.4022225644074161, 0.40230874270205663, -8.617829464024637e-05), ((1,), (3,)): (0\ .4022572799304044, 0.4028430589511877, -0.0005857790207830953), ((1,), (2, 3)): (0.1572956243376338, 0.15752912133024555, -0.00023349699261174833), ((2,), (3\ ,)): (0.40155467460702093, 0.4010003414904268, 0.0005543331165942527), ((2,), (1, 3)): (0.15720685212840368, 0.15707541183717216, 0.00013144029123148703), ((\ 3,), (1, 2)): (0.15724848955315024, 0.15740901803317575, -0.00016052848002539878), ((1, 2), (1, 3)): (0.26591816740297625, 0.2640435177112699, 0.001874649691\ 706341), ((1, 2), (2, 3)): (0.26663439000495925, 0.2670846887533893, -0.0004502987484300561), ((1, 3), (2, 3)): (0.2663380259184582, 0.26493835426047396, 0.0\ 013996716579842122), ((1,), (2,), (3,)): (0.7905066073476456, 0.7904319828757621, 7.462447188370683e-05), ((1, 2), (1, 3), (2, 3)): (0.20905590444451516, 0.2\ 0953758316873955, -0.00048167872422443805)}
Yes, 1 sample is intended. Perhaps its clumsy, I should just delete that dimension and work with 'rp' instead of 'rps'.
Yes, the minimal example is working, that is what I wrote. It is my best effort to construct a minimal example at the moment. I would be happy to provide you with a minimal example that does not work, but to the best of my understanding this is the minimal example and I don't understand why it works and my code does not.
If possible, could you please explain to me what exactly is "line 167" in "estimators_multivariate_pid.py" checking for, and how can that test fail if the input is numpy array of type int64?
Hi Aleksejs,
1 sample would mean taht per experiment you only get a single time point, but you have run 3000+ experiments or epochs - is that really correct?
Michael
On 08.07.21 11:06, Aleksejs Fomins wrote:
1 sample is intended
I honestly need more content. Line 167 simply checks (as you understood correctly) whether the type of dtype of each source (i.e. process) is part of numpy integers.
Your verification might not be enough since it is too far from the if statement and probably something in between went wrong. I would do step by step debugging (maybe print debugging too).
To be crystal clear is to print right before the if statement issubclass(s[i].dtype.type, np.integer)
and s[i].dtype.type
, separately.
Otherwise you need to give more info so that I can help you. The only thing that I could think of is that the data is appropriately assigned but this just a speculation.
Michael: Yes, that is exactly what I mean. For example: I have a short reward phase (part of the trial where a mouse gets or does not get a water reward), e.g. 10 timesteps. During this phase, calcium signal is highly autocorrelated. So, instead of using 10 timesteps, I average them out, and now I have only 1 timestep, but well-behaved. Now, the mouse does the trial 3000+ times, hence the shape of the input.
@Abzinger Here is a part of my actual code
print(settings)
print("Check1", dataEff.shape, dataEff.dtype, src, trg)
print('Check2', issubclass(dataEff.dtype.type, np.integer))
print('Check3', [issubclass(dataEff[:, i].dtype.type, np.integer) for i in src])
print('Check4', issubclass(dataEff[:, trg].dtype.type, np.integer))
dataIDTxl = Data(dataEff, dim_order='rps')
pid = MultivariatePID()
rez = pid.analyse_single_target(settings=settings, data=dataIDTxl, target=trg, sources=src)
And the output below. To the best of my understanding, I have checked that the data is of type numpy integer as you have suggested, and it indeed is. I am a bit puzzled about how to continue debugging this.
{'pid_estimator': 'SxPID', 'lags_pid': [0, 0, 0]}
Check1 (3967, 27, 1) int64 [0, 1, 2] 3
Check2 True
Check3 [True, True, True]
Check4 True
Adding data with properties: 27 processes, 1 samples, 3967 replications
overwriting existing data
Traceback (most recent call last):
File "/home/alyosha/work/git/pub-2020-exploratory-analysis/analysis-gallerosalas-raw/extern/multiscale-pid-joint4D.py", line 38, in <module>
pid_multiprocess_mouse(dataDB, mc, h5outname, argSweepDict, exclQueryLst, metric='MultivariatePID',
File "/home/alyosha/work/git/pub-2020-exploratory-analysis/lib/analysis/pid_multiprocess.py", line 105, in pid_multiprocess_mouse
rezIdxs, rezVals = pid.pid(dataLst, mc, metric=metric, dim=dim, nBin=nBin,
File "/home/alyosha/work/git/pub-2020-exploratory-analysis/lib/analysis/pid_common.py", line 88, in pid
rez = mc.metric3D(metricName, '',
File "/home/alyosha/work/git/mesostat-dev/mesostat/metric/metric.py", line 265, in metric3D
rez = sweepGen.unpack(self.mapper.mapMultiArg(wrappedFunc, sweepGen.iterator()))
File "/home/alyosha/work/git/mesostat-dev/mesostat/utils/parallel.py", line 43, in mapMultiArg
return self.map(f_proxy, x)
File "/home/alyosha/work/git/mesostat-dev/mesostat/utils/parallel.py", line 33, in map
rez = self.map_func(f, x)
File "/home/alyosha/work/git/mesostat-dev/mesostat/utils/parallel.py", line 15, in <lambda>
self.map_func = lambda f,x: list(map(f, x))
File "/home/alyosha/work/git/mesostat-dev/mesostat/utils/parallel.py", line 42, in <lambda>
f_proxy = lambda task: f(*task)
File "/home/alyosha/work/git/mesostat-dev/mesostat/metric/metric.py", line 262, in <lambda>
wrappedFunc = lambda data, settings: metricFunc(data, {**settings, **metricSettings})
File "/home/alyosha/work/git/mesostat-dev/mesostat/utils/decorators.py", line 63, in inner
rez = func(*args, **kwargs)
File "/home/alyosha/work/git/mesostat-dev/mesostat/metric/idtxl_pid.py", line 116, in multivariate_pid_4D
rez = pid.analyse_single_target(settings=settings['settings_estimator'], data=dataIDTxl, target=trg, sources=src)
File "/home/alyosha/Downloads/IDTxl/idtxl/multivariate_pid.py", line 200, in analyse_single_target
self._calculate_pid(data)
File "/home/alyosha/Downloads/IDTxl/idtxl/multivariate_pid.py", line 281, in _calculate_pid
orig_pid = self._pid_estimator.estimate(
File "/home/alyosha/Downloads/IDTxl/idtxl/estimators_multivariate_pid.py", line 78, in estimate
s, t, self.settings = _check_input(s, t, self.settings)
File "/home/alyosha/Downloads/IDTxl/idtxl/estimators_multivariate_pid.py", line 167, in _check_input
raise TypeError('Input s{0} (source {0}) must be an integer numpy '
TypeError: Input s1 (source 1) must be an integer numpy array.
Process finished with exit code 1
Ok, I have saved the variable dataEff
into a file, then tried running the minimal example by loading this variable, and it works. So the code with exactly same inputs works separately from my code, but does not work inside of it. While I still don't know what it is, I suspect that it has nothing to do with IDTxl. Its probably one of those glitches when python reports the wrong bug somehow. I am sorry to trouble you
Good that it works now. No worries, I think we will close this issue now :)
Thanks everyone for clearing this up! I will close the issue.
Dear all,
After further testing, I have been able to construct a minimal example of a bug, and now I suspect it is indeed a bug of IDTxl.
Here is a minimal example that WORKS:
import numpy as np
from idtxl.multivariate_pid import MultivariatePID
from idtxl.data import Data
dataOrig = np.random.randint(0, 4, (1209, 27, 1))
data = Data(dataOrig, 'rps', normalise=False)
pid = MultivariatePID()
settings_SxPID = {'pid_estimator': 'SxPID', 'lags_pid': [0, 0, 0]}
results_SxPID = pid.analyse_single_target(settings=settings_SxPID, data=data, target=3, sources=(0, 1, 2))
print(results_SxPID.get_single_target(3)['avg'])
Here is a minimal example that DOES NOT WORK
import numpy as np
from idtxl.multivariate_pid import MultivariatePID
from idtxl.data import Data
def myfunction(data, settings):
dataIDTxl = Data(data, dim_order='rps')
pid = MultivariatePID()
rez = pid.analyse_single_target(settings=settings,
data=dataIDTxl, target=3, sources=(0,1,2))
return rez.get_single_target(3)['avg']
dataOrig = np.random.randint(0, 4, (1209, 27, 1))
settings = {'pid_estimator': 'SxPID', 'lags_pid': [0, 0, 0]}
print(myfunction(dataOrig, settings))
For whatever reason, MultivariatePID class seems to disbehave when wrapped into a function. Any advice is appreciated
Hi Aleksejs,
This is a problem is with the data within myfunction
and not because it is wrapped in a function.
I tried the example that doesn't work and indeed it didn't work.
However I changed the first line in myfunction
from dataIDTxl = Data(data, dim_order='rsp')
to dataIDTxl = Data(data, 'rsp', normalise=False)
. Then it worked.
This is because the problem stems from not setting the normalise option to be False. The option normalise is by default true, i.e. the data is z-transformed and its type is transformed into a float.
Tbh I don't recall why the sources and targets are chosen to be integers but maybe @pwollstadt remembers why :grimacing:
Cheers, Abed
Uff, I'm blind, I totally missed that there was a difference in normalization parameter. Thanks so much for checking. I'll try to run again and get back to you
Hi all,
maybe we should mandatorily require the normalize parameter to be set in the creation of the data object, as it does not make sense at all, if one wants to analyze discrete data. I also totally missed that.
Michael
On 13.07.21 15:55, Abdullah Makkeh wrote:
Hi Aleksejs,
This is a problem is with the data within |myfunction| and not because it is wrapped in a function.
I tried the example that doesn't work and indeed it didn't work. However I changed the first line in |myfunction| from |dataIDTxl = Data(data, dim_order='rsp')| to |dataIDTxl = Data(data, 'rsp', normalise=False)|. Then it worked. This is because the problem stems from not setting the normalise option to be False. The option normalise is by default true, i.e. the data is z-transformed and its type is transformed into a float.
Tbh I don't recall why the sources and targets are chosen to be integers but maybe @pwollstadt https://github.com/pwollstadt remembers why 😬
Cheers, Abed
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pwollstadt/IDTxl/issues/70#issuecomment-879109919, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFJQGW2B3NEFTVQLFZ4NNLTXRAVJANCNFSM477ANM7Q.
When running MultivariatePID estimator, I get the following bug
Sadly I am unable to replicate bug with the minimal example. Here is the minimal example of what I am trying to do
I have so far:
dataNumpy
) is indeed of type int64 by printing its dtype just prior to execution ofanalyse_single_target
What am I missing?