py-why / causal-learn

Causal Discovery in Python. It also includes (conditional) independence tests and score functions.
https://causal-learn.readthedocs.io/en/latest/
MIT License
1.04k stars 174 forks source link

I failed to use kci method in big data. Does this method doesn't support run in big data? #134

Closed maozhy3 closed 9 months ago

maozhy3 commented 9 months ago

I failed to use kci method in big data. Does this method doesn't support run in big data?

`from causallearn.utils.cit import CIT

load methylation,省略

pValue = [] kci_obj = CIT(methylation[1:1001, [0,1]], "kci", kernelZ='Polynomial', approx=False, est_width='median') pValue.append(kci_obj(0, 1) ) print(pValue)`

and return: `C:\ProgramData\Anaconda3\envs\env_TianChi\lib\site-packages\numpy\core_methods.py:152: RuntimeWarning: overflow encountered in reduce arrmean = umr_sum(arr, axis, dtype, keepdims=True, where=where) [0.0]

Process finished with exit code 0`

when i try methylation[1:1002, [0,1]] C:\ProgramData\Anaconda3\envs\env_TianChi\python.exe C:/Users/Scout/Desktop/TianChi/20230912/data_h5_filtered_kci.py C:\ProgramData\Anaconda3\envs\env_TianChi\lib\site-packages\numpy\core\_methods.py:152: RuntimeWarning: overflow encountered in reduce arrmean = umr_sum(arr, axis, dtype, keepdims=True, where=where) Traceback (most recent call last): File "C:\Users\Scout\Desktop\TianChi\20230912\data_h5_filtered_kci.py", line 50, in <module> pValue.append(kci_obj(0, 1) ) File "C:\ProgramData\Anaconda3\envs\env_TianChi\lib\site-packages\causallearn\utils\cit.py", line 191, in __call__ p = self.kci_ui.compute_pvalue(self.data[:, Xs], self.data[:, Ys])[0] if len(condition_set) == 0 else \ File "C:\ProgramData\Anaconda3\envs\env_TianChi\lib\site-packages\causallearn\utils\KCI\KCI.py", line 88, in compute_pvalue null_dstr = self.null_sample_spectral(Kxc, Kyc) File "C:\ProgramData\Anaconda3\envs\env_TianChi\lib\site-packages\causallearn\utils\KCI\KCI.py", line 194, in null_sample_spectral num_eig = np.int(np.floor(T / 2)) File "C:\ProgramData\Anaconda3\envs\env_TianChi\lib\site-packages\numpy\__init__.py", line 319, in __getattr__ raise AttributeError(__former_attrs__[attr]) AttributeError: module 'numpy' has no attribute 'int'. np.intwas a deprecated alias for the builtinint. To avoid this error in existing code, useintby itself. Doing this will not modify any behavior and is safe. When replacingnp.int, you may wish to use e.g.np.int64ornp.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information. The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Process finished with exit code 1

`

kunwuz commented 9 months ago

What's the sample size of your data? The complexity of KCI sales cubically in the number of samples so it might not be efficient with a large dataset.

maozhy3 commented 9 months ago

Thank you very much for answering, my sample size is around 8000