mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
2.97k stars 392 forks source link

Not using all CPU Cores for CatBoost #425

Open nathanballou opened 3 years ago

nathanballou commented 3 years ago

When training, the first few models seem to utilize multiple cores, but starting with Catboost, it seems to default to one core in version 0.10.6. This did not appear to be an issue before I updated to this version.

Using Catboost version 0.26.

I am using a m5.24xlarge AWS instance and am seeing 1.3% CPU utilization.

image image

pplonski commented 2 years ago

@NaRobBoo do you see this behavior for any dataset or just for your data?

markdickson commented 2 years ago

I ran into this same problem while using MLJar on the IEEE-Kaggle Fraud dataset.

markdickson commented 2 years ago

I am amending my previous comment: after watching the CPU monitor for a while, MLJar did eventually use all my cores. There were long intervals of single-core utilization separated by shorter periods of multiple-core utilization.

pplonski commented 2 years ago

@markdickson thank you for checking this. If I will find time I will check what AutoML is doing on long single core intervals.