mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
2.97k stars 392 forks source link

[LightGBM] [Fatal] Unknown token <2737 in data file #535

Open xucian opened 2 years ago

xucian commented 2 years ago

Hello. Getting this error. Any clue? The message hints at a RAM problem, but I have 11/48 GB RAM usage. The problem seems to only appear with LightGBM as other models seem to work fine. I'm using a dataset with 300k data points, each of 9k features. It should fit in memory IMO.

[LightGBM] [Warning] Unknown token <2737 in data file
[LightGBM] [Fatal] Unknown token <2737 in data file
2022-04-08 16:54:44,179 concurrent.futures ERROR exception calling callback for <Future at 0x210b25f4310 state=finished raised TerminatedWorkerError>
Traceback (most recent call last):
  File "C:\dev\py.trading.binance.bot\venv\lib\site-packages\joblib\externals\loky\_base.py", line 625, in _invoke_callbacks
    callback(self)
  File "C:\dev\py.trading.binance.bot\venv\lib\site-packages\joblib\parallel.py", line 359, in __call__
    self.parallel.dispatch_next()
  File "C:\dev\py.trading.binance.bot\venv\lib\site-packages\joblib\parallel.py", line 794, in dispatch_next
    if not self.dispatch_one_batch(self._original_iterator):
  File "C:\dev\py.trading.binance.bot\venv\lib\site-packages\joblib\parallel.py", line 861, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\dev\py.trading.binance.bot\venv\lib\site-packages\joblib\parallel.py", line 779, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "C:\dev\py.trading.binance.bot\venv\lib\site-packages\joblib\_parallel_backends.py", line 531, in apply_async
    future = self._workers.submit(SafeFunction(func))
  File "C:\dev\py.trading.binance.bot\venv\lib\site-packages\joblib\externals\loky\reusable_executor.py", line 177, in submit
    return super(_ReusablePoolExecutor, self).submit(
  File "C:\dev\py.trading.binance.bot\venv\lib\site-packages\joblib\externals\loky\process_executor.py", line 1115, in submit
    raise self._flags.broken
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.
pplonski commented 2 years ago

It might be too small RAM available as in error message.