mindsdb / mindsdb_native

Machine Learning in one line of code
http://mindsdb.com
GNU General Public License v3.0
36 stars 28 forks source link

Cant train predictor with multiple targets from CH DS #283

Closed StpMax closed 4 years ago

StpMax commented 4 years ago

When i try predict multiple targets using CH DS, i get error:

Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "/home/maxs/dev/mdb/venv_new/sources/mindsdb_native/mindsdb_native/libs/controllers/predictor.py", line 304, in learn
    self.transaction.run()
  File "/home/maxs/dev/mdb/venv_new/sources/mindsdb_native/mindsdb_native/libs/controllers/transaction.py", line 239, in run
    self._run()
  File "/home/maxs/dev/mdb/venv_new/sources/mindsdb_native/mindsdb_native/libs/controllers/transaction.py", line 235, in _run
    raise e
  File "/home/maxs/dev/mdb/venv_new/sources/mindsdb_native/mindsdb_native/libs/controllers/transaction.py", line 198, in _run
    self._call_phase_module(module_name='DataCleaner')
  File "/home/maxs/dev/mdb/venv_new/sources/mindsdb_native/mindsdb_native/libs/controllers/transaction.py", line 165, in _call_phase_module
    ret = module(self.session, self)(**kwargs)
  File "/home/maxs/dev/mdb/venv_new/sources/mindsdb_native/mindsdb_native/libs/phases/base_module.py", line 53, in __call__
    ret = self.run(**kwargs)
  File "/home/maxs/dev/mdb/venv_new/sources/mindsdb_native/mindsdb_native/libs/phases/data_cleaner/data_cleaner.py", line 71, in run
    self._remove_missing_targets(df)
  File "/home/maxs/dev/mdb/venv_new/sources/mindsdb_native/mindsdb_native/libs/phases/data_cleaner/data_cleaner.py", line 22, in _remove_missing_targets
    df.dropna(subset=self.transaction.lmd['predict_columns'], inplace=True)
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/pandas/core/frame.py", line 4994, in dropna
    raise KeyError(list(np.compress(check, subset)))
KeyError: ['location']

I know we want to remove multiple targets prediction, but we dont do it yet, ant this is pretty strange getting this error only with CH datasource. To reproduce it, need have connection to CH with test_data.home_rentals ds:

from mindsdb_native import ClickhouseDS
ds = ClickhouseDS(
    query='select * from test_data.home_rentals limit 100',
    user='default',
    password='',
    host='127.0.0.1',
    port=8123
)
mindsdb_native.Predictor(name='zzz').learn(
    from_data=ds,
    to_predict=['rental_price', 'location'],
    stop_training_in_x_seconds=3,
    use_gpu=False
)
StpMax commented 4 years ago

I tried find when it appear, and looks like first time it happen in this commit: 226212f45b8a30fcfc93613f916bb6f8e2f6a818