microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.73k stars 3.84k forks source link

No Improvement in Training Time with more Cores on LightGBM #6730

Open abhishekagrawala opened 4 days ago

abhishekagrawala commented 4 days ago

Description

Training a 6GB dataset with LightGBM using n_jobs=70 does not result in a proportional reduction in training time. Despite utilizing a machine with 72 cores and setting a high n_jobs value, the training time remains unexpectedly high.

Environment

OS: Linux 6.1.0-27-cloud-amd64 Debian
CPU:
  Architecture:             x86_64
  CPU(s):                   72  
    - Model Name:           Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz  
    - Cores:                72 (1 thread per core)  
    - Flags:                AVX, AVX2, AVX512, FMA, etc.  
  Memory:                   288 MB L2 Cache, 16 MB L3 Cache  
  NUMA Node(s):             1  
Memory:
                    total        used        free      shared  buff/cache   available  
      Mem:           491Gi        81Gi       399Gi       1.1Mi        15Gi       410Gi  
      Swap:           79Gi        84Mi        79Gi  
Storage:
  Filesystem      Size  Used Avail Use% Mounted on  
  udev            246G     0  246G   0% /dev  
  tmpfs            50G  1.8M   50G   1% /run  
  /dev/sda1       197G  104G   86G  55% /  
  tmpfs           246G     0  246G   0% /dev/shm  
  tmpfs           5.0M     0  5.0M   0% /run/lock  
  /dev/sda15      124M   12M  113M  10% /boot/efi  
  tmpfs            50G     0   50G   0% /run/user/10476  
  tmpfs            50G     0   50G   0% /run/user/90289  
  tmpfs            50G     0   50G   0% /run/user/1003  
VM Type: Custom VM on a cloud environment.

LightGBM Setup

  Version: 3.2.1=py38h709712a_0
  Parameters: n_estimators=325, num_leaves=512, colsample_bytree=0.2, min_data_in_leaf=80, max_depth=22, learning_rate=0.09, objective="binary", n_jobs=70, boost_from_average=True, max_bin=200, bagging_fraction=0.999, lambda_l1=0.29, lambda_l2=0.165
Dataset:
  Size: ~6GB
  Characteristics: Binary classification problem, categorical and numerical features, preprocessed and balanced.
Performance Issues
Current Performance:
    Training time with n_jobs=32: ~25 minutes
    Training time with n_jobs=70: ~23 minutes
Expected Performance:
    Substantial reduction in training time when utilizing 70 cores, ideally below 10 minutes.
Bottleneck Symptoms:
    Minimal reduction in training time with increased cores (n_jobs).
    CPU utilization remains low, with individual threads not fully utilized.
System Metrics During Training
   CPU Utilization:
      Average utilization: ~40%  
      Peak utilization: ~55%  
      Core-specific activity: Most cores show low activity levels (<30%)  
   Memory Usage:
      Utilized during training: ~81Gi  
      Free memory: ~399Gi  
      Swap usage: ~84Mi  
  Disk I/O:
      Read: ~50MB/s  
      Write: ~30MB/s  
      I/O wait time: ~2%

Request for Support

Explanation of why n_jobs scaling is not improving training time.

Suggestions for configurations to fully utilize 70 cores for LightGBM training.

Recommendations for debugging and monitoring specific to LightGBM threading or system-level bottlenecks.

jameslamb commented 4 days ago

Thanks for using LightGBM. I've attempted to reformat your post a bit to make it easier to read... if you are new to markdown / GitHub, please see https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax for some tips on making such changes yourself.

You haven't provided enough information yet for us to help you with this report.