syanga / pycit

(Conditional) Independence testing & Markov blanket feature selection using k-NN mutual information and conditional mutual information estimators. Supports continuous, discrete, and mixed data, as well as multiprocessing.
MIT License
20 stars 5 forks source link

Dataframe containing mixed data cannot be processed by itest(). #3

Closed annaehhh closed 1 year ago

annaehhh commented 1 year ago

I have a Numpy array containing mixed data (categorical, continuous). Inputting this into the itest() as

pval= itest(X, Y, test_args={'statistic': 'mixed_mi', 'n_jobs': 1})

brings me the error message

ValueError: could not convert string to float: 'NO'

from:

File [c:\XXX.venv\Lib\site-packages\sklearn\utils_array_api.py:185], in _asarray_withorder(array, dtype, order, copy, xp) 182 xp, = get_namespace(array) 183 if xp.name in {"numpy", "numpy.array_api"}: 184 # Use NumPy API to support order --> 185 array = numpy.asarray(array, order=order, dtype=dtype) 186 return xp.asarray(array, copy=copy) 187 else:

How do I input mixed data in order to test it for independence?

syanga commented 1 year ago

At the moment, the datatypes are all numerical. Please convert string values like 'NO' and 'YES' to 0.0 and 1.0, for example.