pafoster / pyitlib

A library of information-theoretic methods for data analysis and machine learning, implemented in Python and NumPy.
MIT License
90 stars 17 forks source link

ValueError: y contains previously unseen labels: [-1] #7

Open GXY2017 opened 3 years ago

GXY2017 commented 3 years ago

Python 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 21:48:41) [MSC v.1916 64 bit (AMD64)]

Before I go further, I’d like to report this error. I know this lib is not for python 3.

I tried this simple example, then got the error in title. drv.entropy(['e', 'f', 'g', 't'], base = np.exp(1)) I suspect the error comes from scipy.transform, it requires fit first then transform().

see this link: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html

Update: problem in the _map_observations_to_integers(), everything needs mask, fill_value will trigger errors.

smarie commented 3 years ago

I have the same issue, this seems to come from the usage of "-1" as default filling value when the dtype is object and the array is numpy:

import pyitlib.discrete_random_variable as drv 
import numpy as np

drv.entropy_conditional(np.array([['A', 'B', 'R'], ['A', 'B', 'A']]))

yields

  File "C:\Miniconda3\envs\tools_py37\lib\site-packages\pyitlib\discrete_random_variable.py", line 3495, in entropy_conditional
    fill_value_Alphabet_Y))
  File "C:\Miniconda3\envs\tools_py37\lib\site-packages\pyitlib\discrete_random_variable.py", line 4689, in _map_observations_to_integers
    Fill_values = [L.transform(np.atleast_1d(f)) for f in Fill_values]
  File "C:\Miniconda3\envs\tools_py37\lib\site-packages\pyitlib\discrete_random_variable.py", line 4689, in <listcomp>
    Fill_values = [L.transform(np.atleast_1d(f)) for f in Fill_values]
  File "C:\Miniconda3\envs\tools_py37\lib\site-packages\sklearn\preprocessing\_label.py", line 277, in transform
    _, y = _encode(y, uniques=self.classes_, encode=True)
  File "C:\Miniconda3\envs\tools_py37\lib\site-packages\sklearn\preprocessing\_label.py", line 122, in _encode
    check_unknown=check_unknown)
  File "C:\Miniconda3\envs\tools_py37\lib\site-packages\sklearn\preprocessing\_label.py", line 51, in _encode_numpy
    % str(diff))
ValueError: y contains previously unseen labels: [-1]

However when I specify a different filling value I get other problems

drv.entropy_conditional(np.array([['A', 'B', 'R'], ['A', 'B', 'A']]), fill_value='na')

yields

  File "<ipython-input-7-f3179c2d30c8>", line 1, in <module>
    drv.entropy_conditional(np.array([['A', 'B', 'R'], ['A', 'B', 'A']]), fill_value='na')
  File "C:\Miniconda3\envs\tools_py37\lib\site-packages\pyitlib\discrete_random_variable.py", line 3495, in entropy_conditional
    fill_value_Alphabet_Y))
  File "C:\Miniconda3\envs\tools_py37\lib\site-packages\pyitlib\discrete_random_variable.py", line 4695, in _map_observations_to_integers
    assert(np.all([A.dtype == 'int' for A in Symbol_matrices]))
AssertionError

And finally using explicit maked numpy array leads to another error:

drv.entropy_conditional(np.ma.array([['A', 'B', 'R'], ['A', 'B', 'A']]))
  File "<ipython-input-9-3036d1e07c6a>", line 1, in <module>
    drv.entropy_conditional(np.ma.array([['A', 'B', 'R'], ['A', 'B', 'A']]))
  File "C:\Miniconda3\envs\tools_py37\lib\site-packages\pyitlib\discrete_random_variable.py", line 3441, in entropy_conditional
    Y, fill_value_Y = _sanitise_array_input(Y, fill_value)
  File "C:\Miniconda3\envs\tools_py37\lib\site-packages\pyitlib\discrete_random_variable.py", line 4709, in _sanitise_array_input
    if np.any(np.equal(X, None)) or fill_value is None:
  File "C:\Miniconda3\envs\tools_py37\lib\site-packages\numpy\ma\core.py", line 3019, in __array_finalize__
    self._fill_value = _check_fill_value(self._fill_value, self.dtype)
  File "C:\Miniconda3\envs\tools_py37\lib\site-packages\numpy\ma\core.py", line 480, in _check_fill_value
    raise TypeError(err_msg % (fill_value, ndtype))
TypeError: Cannot convert fill_value N/A to dtype bool