Please make n_jobs configurable

mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation

https://mljar.com

MIT License

3.05k stars 406 forks source link

Please make n_jobs configurable #322

Closed jackmorgenson closed 3 years ago

jackmorgenson commented 3 years ago

I've read (and negatively experienced) that n_jobs = -1 is hard coded for applicable models such as LightGBM, XGBoost, etc. My organization runs Jupyter notebooks in a kubernetes containerized environment. Unfortunately, that means the container OS and python see all CPU cores of the underlying physical host. For example, a notebook is spawned with 4 logical processors. However, multiprocessor.cpu_count() reveals 88 processors, as it can see through the container layer.

With n_jobs hardcoded to -1, the mljar AutoML fails because it thinks it can run 88 parallel threads on 4 logical processors, so basically the notebook just freezes and panics because of all of the CPU thrashing.

Please consider making n_jobs configurable.

Thanks!

pplonski commented 3 years ago

Hi @jackmorgenson! I wasn't aware of such a problem. Thank you.

This should be easy to add. I will add this in the next release 0.9.0.

What do you mean that MLJAR AutoML fails? Are there any errors? or it was slow to train?

jackmorgenson commented 3 years ago

Linear, LightGBM, XGBoost, anything that uses n_jobs... the training will eventually time out. I ran the muli-class classification example as-is. Here is a comparison:

Run on personal computer with 6 cores: 2_Default_LightGBM logloss 0.119508 trained in 226.03 seconds.
Run in a container on Kubernetes with 8 logical processors: 2_Default_LightGBM times out, sorry I forget what the timeout was, but it was at least 30 minutes? Anyway, CPU usage was consistently at 800%, however there were dozens and dozens of python processes trying to run because the OS can see 88 processors. So, in our organization we always have folks specify n_jobs to the number of logical processors they specified when spawning the notebook.

pplonski commented 3 years ago

The n_jobs parameter is added in AutoML() constructor. By default, it is set to -1 which means that all CPUs will be used.

Unfortunately, MLP implementation from sklearn doesn't support the n_jobs parameter, so when a number of jobs is set different than -1 then the Neural Network algorithm is disabled (not trained at all).

The changes will go to the next release 0.9.0. Right now there are in the dev branch. To install the package with the newest changes please run:

pip install -U git+https://github.com/mljar/mljar-supervised.git@dev

tijeco commented 3 years ago

@pplonski

I'm using 0.10.4 , and when I set n_jobs=1 under AutoML() it fires up every single core when I check it on htop.

It seems that n_jobs may not be behaving as desired. I only want it to use 10 cores at a time, and no more than that since I work on a shared server.

Any help to this regard would be greatly appreciated!

pplonski commented 3 years ago

@tijeco there was a bug that feature importance computation was using all cores - https://github.com/mljar/mljar-supervised/issues/398 - it is fixed in 0.10.6. Do you compute feature importance?

tijeco commented 3 years ago

@pplonski thanks! I didn't notice the other issue. I'll get 0.10.6 and see how that goes.

I do intend to calculate feature importance!

tijeco commented 3 years ago

@pplonski So I monitored htop closely as it ran with n_jobs=1 on a smallish classification dataset.

When Xgboost starts, all the cores start lighting up. Maybe there is something in Xgboost code that disregards n_jobs?

tijeco commented 3 years ago

@pplonski I found a workaround for the meantime! I just learned about taskset, and honestly I'm embarrassed that I haven't heard of it sooner. But it limits the thread usage of a given process.

taskset -c 0-10 python xxx.py limits a process to 10 threads

pplonski commented 3 years ago

@tijeco glad that you found the solution, it might be something with external packages.

I found this stackoverflow discussion https://stackoverflow.com/questions/48269248/limiting-the-number-of-threads-used-by-xgboost

Solution was to set:

import os
os.environ['OMP_NUM_THREADS'] = "1"

Maybe you can try to run your code but with setting OMP_NUM_THREADS at the beginning?