mindsdb / mindsdb_native

Machine Learning in one line of code
http://mindsdb.com
GNU General Public License v3.0
37 stars 28 forks source link

Error during train 'diamonds' predictor #392

Closed StpMax closed 3 years ago

StpMax commented 3 years ago

How to reproduce:

  1. Run mindsdb with http api
  2. In scout upload 'diamonds' csv
  3. Start predictor training with any field to predict.

Head of error same as in https://github.com/mindsdb/lightwood/issues/355 so probably not related to that issue

ERROR:mindsdb-logger-5a2adb04-5720-11eb-a3e2-2c56dc4ecd27---no_report:/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/mindsdb_native/libs/phases/model_interface/lightwood_backend.py:417 - Traceback (most recent call last):
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/mindsdb_native/libs/phases/model_interface/lightwood_backend.py", line 411, in train
    test_data=lightwood_test_ds
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/lightwood/api/predictor.py", line 137, in learn
    self._mixer.fit(train_ds=train_ds, test_ds=test_ds)
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/lightwood/mixers/base_mixer.py", line 37, in fit
    self._fit(train_ds, test_ds, **kwargs)
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/lightwood/mixers/nn.py", line 270, in _fit
    for epoch, training_error in enumerate(self._iter_fit(subset_train_ds, subset_id=subset_id)):
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/lightwood/mixers/nn.py", line 571, in _iter_fit
    outputs = self.net(inputs)
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/lightwood/mixers/helpers/default_net.py", line 125, in forward
    output = self._foward_net(input)
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/torch/nn/functional.py", line 1690, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: CUDA error: no kernel image is available for execution on the device

ERROR:mindsdb-logger-5a2adb04-5720-11eb-a3e2-2c56dc4ecd27---no_report:/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/mindsdb_native/libs/phases/model_interface/lightwood_backend.py:418 - Exception while running NnMixer

ERROR:mindsdb-logger-5a2adb04-5720-11eb-a3e2-2c56dc4ecd27---no_report:/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/mindsdb_native/libs/controllers/transaction.py:181 - Could not load module ModelInterface

ERROR:mindsdb-logger-5a2adb04-5720-11eb-a3e2-2c56dc4ecd27---no_report:/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/mindsdb_native/libs/controllers/transaction.py:269 - All models had an error while training

Process PredictorProcess-1:1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/maxs/dev/mdb/venv_new/sources/mindsdb/mindsdb/interfaces/native/predictor_process.py", line 33, in run
    **kwargs
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/mindsdb_native/libs/controllers/predictor.py", line 302, in learn
    self.transaction.run()
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/mindsdb_native/libs/controllers/transaction.py", line 274, in run
    self._run()
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/mindsdb_native/libs/controllers/transaction.py", line 270, in _run
    raise e
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/mindsdb_native/libs/controllers/transaction.py", line 246, in _run
    self._call_phase_module(module_name='ModelInterface', mode='train')
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/mindsdb_native/libs/controllers/transaction.py", line 178, in _call_phase_module
    ret = module(self.session, self)(**kwargs)
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/mindsdb_native/libs/phases/base_module.py", line 53, in __call__
    ret = self.run(**kwargs)
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/mindsdb_native/libs/phases/model_interface/model_interface.py", line 19, in run
    self.transaction.model_backend.train()
  File "/home/maxs/dev/mdb/venv_new/lib/python3.6/site-packages/mindsdb_native/libs/phases/model_interface/lightwood_backend.py", line 446, in train
    raise Exception('All models had an error while training')
Exception: All models had an error while training
George3d6 commented 3 years ago

@StpMax did you set use_gpu=False ?

Anyway, if you remember, remind me of this during a call, we should debug it together, I assume/hope there should be an easy fix.

StpMax commented 3 years ago

@George3d6 yeah, it training if set use_gpu=False. Error only of not set.

paxcema commented 3 years ago

Closing for now, as I'm pretty sure the root cause was this Lightwood issue.