ogrisel commented 6 years ago

Tentative fix for #31.

ogrisel commented 6 years ago

At this time some tests are broken (need to be updated) and ~~the code runs slightly slower than on master for some reason I do not understand yet~~ (on my laptop the speed is the same as on master).

The memory leak issue should be fixed though.

ogrisel commented 6 years ago

The fact that we observe a slowdown when we update the packed histograms array in the parallel for loop while we do not observe this issue in master where the packed histograms array is filled sequentially might be a case of False Sharing.

ogrisel commented 6 years ago

The fact that we observe a slowdown when we update the packed histograms array in the parallel for loop while we do not observe this issue in master where the packed histograms array is filled sequentially might be a case of False Sharing.

@NicolasHug noted that false sharing might not be a problem in a write only datastructure updated in a parallel for loop. I don't know.

What I observe though is that on a 12 cores machine, the code runs 2x faster with "tbb" as the numba.config.THREADING_LAYER that with "workqueue". "'omp'" performance is closer to "tbb" than "workqueue". But even with `"tbb", LightGBM is significantly faster than pygbm on this machine.

ogrisel commented 6 years ago

Actually I tried again with master and the various threading backends and this PR is either as fast or faster than master. I must have done something wrong when I reported the initial slow down.

In any case, tbb is significantly faster than the workqueue backend when the number of cores is large (e.g. 12 in my case).

ogrisel commented 6 years ago

@NicolasHug I have to go, feel free to update the failing tests and merge this PR.

codecov[bot] commented 6 years ago

Codecov Report

Merging #36 into master will increase coverage by 0.02%. The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #36      +/-   ##
==========================================
+ Coverage   94.34%   94.36%   +0.02%     
==========================================
  Files           8        8              
  Lines         778      781       +3     
==========================================
+ Hits          734      737       +3     
  Misses         44       44

Impacted Files	Coverage Δ
pygbm/splitting.py	`99.47% <100%> (ø)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 154def0...0960a2f. Read the comment docs.

NicolasHug commented 6 years ago

Here are the same plots from https://github.com/ogrisel/pygbm/issues/31#issuecomment-435626534 now. I don't know how I feel about the 1e7 case.

leak leak2

We have now the following results with the benchmark from: https://github.com/ogrisel/pygbm/issues/30#issue-376370377 (numba is pre-compiled here for fair comparisons):

Laptop with 8GB RAM, i5 7th gen.

Lightgbm: 75.408s, ROC AUC: 0.8293 Pygbm: 83.022s, ROC AUC: 0.8156 No VIRT explosion

:smile:

Code:

```python from urllib.request import urlretrieve import os from gzip import GzipFile from time import time import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score from joblib import Memory from pygbm import GradientBoostingMachine from lightgbm import LGBMRegressor import numba import gc HERE = os.path.dirname(__file__) URL = ("https://archive.ics.uci.edu/ml/machine-learning-databases/00280/" "HIGGS.csv.gz") m = Memory(location='/tmp', mmap_mode='r') @m.cache def load_data(): filename = os.path.join(HERE, URL.rsplit('/', 1)[-1]) if not os.path.exists(filename): print(f"Downloading {URL} to {filename} (2.6 GB)...") urlretrieve(URL, filename) print("done.") print(f"Parsing {filename}...") tic = time() with GzipFile(filename) as f: df = pd.read_csv(f, header=None, dtype=np.float32) toc = time() print(f"Loaded {df.values.nbytes / 1e9:0.3f} GB in {toc - tic:0.3f}s") return df df = load_data() n_leaf_nodes = 255 n_trees = 500 lr = 0.05 max_bins = 255 subsample = 1000000 # Change this to 10000000 if you wish, or to None target = df.values[:, 0] data = np.ascontiguousarray(df.values[:, 1:]) data_train, data_test, target_train, target_test = train_test_split( data, target, test_size=50000, random_state=0) if subsample is not None: data_train, target_train = data_train[:subsample], target_train[:subsample] n_samples, n_features = data_train.shape print(f"Training set with {n_samples} records with {n_features} features.") gc.collect() print("Compiling pygbm...") tic = time() pygbm_model = GradientBoostingMachine(learning_rate=lr, max_iter=n_trees, max_bins=max_bins, max_leaf_nodes=n_leaf_nodes, random_state=0, scoring=None, verbose=0, validation_split=None) pygbm_model.fit(data_train[:100], target_train[:100]) toc = time() predicted_test = pygbm_model.predict(data_test) roc_auc = roc_auc_score(target_test, predicted_test) print(f"done in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}") del pygbm_model del predicted_test print("Fitting a LightGBM model...") tic = time() lightgbm_model = LGBMRegressor(n_estimators=n_trees, num_leaves=n_leaf_nodes, learning_rate=lr, silent=False) lightgbm_model.fit(data_train, target_train) toc = time() predicted_test = lightgbm_model.predict(data_test) roc_auc = roc_auc_score(target_test, predicted_test) print(f"done in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}") del lightgbm_model del predicted_test gc.collect() print("Fitting a pygbm model...") tic = time() pygbm_model = GradientBoostingMachine(learning_rate=lr, max_iter=n_trees, max_bins=max_bins, max_leaf_nodes=n_leaf_nodes, random_state=0, scoring=None, verbose=1, validation_split=None) pygbm_model.fit(data_train, target_train) toc = time() predicted_test = pygbm_model.predict(data_test) roc_auc = roc_auc_score(target_test, predicted_test) print(f"done in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}") del pygbm_model del predicted_test gc.collect() if hasattr(numba, 'threading_layer'): ```

Log:

``` [LightGBM] [Info] Total Bins 6143 [LightGBM] [Info] Number of data: 1000000, number of used features: 28 [LightGBM] [Info] Start training from score 0.529479 Training set with 1000000 records with 28 features. Compiling pygbm... done in 9.049s, ROC AUC: 0.5340 Fitting a LightGBM model... done in 75.408s, ROC AUC: 0.8293 Fitting a pygbm model... Binning 0.112 GB of data: 1.003 s (111.671 MB/s) Fitting gradient boosted rounds: [0/500] [1/500] 255 leaf nodes, max depth 13 in 0.470s [2/500] 255 leaf nodes, max depth 17 in 0.403s [3/500] 255 leaf nodes, max depth 15 in 0.376s [4/500] 255 leaf nodes, max depth 19 in 0.363s [5/500] 255 leaf nodes, max depth 17 in 0.358s [6/500] 255 leaf nodes, max depth 15 in 0.351s [7/500] 255 leaf nodes, max depth 17 in 0.348s [8/500] 255 leaf nodes, max depth 17 in 0.344s [9/500] 255 leaf nodes, max depth 15 in 0.340s [10/500] 255 leaf nodes, max depth 17 in 0.338s [11/500] 255 leaf nodes, max depth 15 in 0.336s [12/500] 255 leaf nodes, max depth 16 in 0.333s [13/500] 255 leaf nodes, max depth 17 in 0.331s [14/500] 255 leaf nodes, max depth 21 in 0.332s [15/500] 255 leaf nodes, max depth 20 in 0.331s [16/500] 255 leaf nodes, max depth 20 in 0.331s [17/500] 255 leaf nodes, max depth 18 in 0.329s [18/500] 255 leaf nodes, max depth 20 in 0.329s [19/500] 255 leaf nodes, max depth 15 in 0.327s [20/500] 255 leaf nodes, max depth 21 in 0.326s [21/500] 255 leaf nodes, max depth 22 in 0.326s [22/500] 255 leaf nodes, max depth 17 in 0.325s [23/500] 255 leaf nodes, max depth 18 in 0.324s [24/500] 255 leaf nodes, max depth 20 in 0.324s [25/500] 255 leaf nodes, max depth 18 in 0.324s [26/500] 255 leaf nodes, max depth 19 in 0.324s [27/500] 255 leaf nodes, max depth 19 in 0.323s [28/500] 255 leaf nodes, max depth 18 in 0.322s [29/500] 255 leaf nodes, max depth 19 in 0.322s [30/500] 255 leaf nodes, max depth 19 in 0.321s [31/500] 255 leaf nodes, max depth 18 in 0.321s [32/500] 255 leaf nodes, max depth 19 in 0.320s [33/500] 255 leaf nodes, max depth 21 in 0.320s [34/500] 255 leaf nodes, max depth 19 in 0.320s [35/500] 255 leaf nodes, max depth 17 in 0.319s [36/500] 255 leaf nodes, max depth 17 in 0.318s [37/500] 255 leaf nodes, max depth 16 in 0.318s [38/500] 255 leaf nodes, max depth 18 in 0.317s [39/500] 255 leaf nodes, max depth 23 in 0.317s [40/500] 255 leaf nodes, max depth 21 in 0.316s [41/500] 255 leaf nodes, max depth 19 in 0.316s [42/500] 255 leaf nodes, max depth 19 in 0.316s [43/500] 255 leaf nodes, max depth 19 in 0.315s [44/500] 255 leaf nodes, max depth 14 in 0.315s [45/500] 255 leaf nodes, max depth 18 in 0.314s [46/500] 255 leaf nodes, max depth 23 in 0.314s [47/500] 255 leaf nodes, max depth 20 in 0.314s [48/500] 255 leaf nodes, max depth 16 in 0.313s [49/500] 255 leaf nodes, max depth 24 in 0.313s [50/500] 255 leaf nodes, max depth 20 in 0.313s [51/500] 255 leaf nodes, max depth 20 in 0.313s [52/500] 255 leaf nodes, max depth 26 in 0.312s [53/500] 255 leaf nodes, max depth 20 in 0.312s [54/500] 255 leaf nodes, max depth 26 in 0.312s [55/500] 255 leaf nodes, max depth 20 in 0.311s [56/500] 255 leaf nodes, max depth 16 in 0.311s [57/500] 255 leaf nodes, max depth 18 in 0.310s [58/500] 255 leaf nodes, max depth 21 in 0.310s [59/500] 255 leaf nodes, max depth 19 in 0.310s [60/500] 255 leaf nodes, max depth 20 in 0.310s [61/500] 255 leaf nodes, max depth 17 in 0.309s [62/500] 255 leaf nodes, max depth 21 in 0.309s [63/500] 255 leaf nodes, max depth 20 in 0.308s [64/500] 255 leaf nodes, max depth 26 in 0.308s [65/500] 255 leaf nodes, max depth 18 in 0.308s [66/500] 255 leaf nodes, max depth 19 in 0.308s [67/500] 255 leaf nodes, max depth 24 in 0.308s [68/500] 255 leaf nodes, max depth 25 in 0.308s [69/500] 255 leaf nodes, max depth 16 in 0.307s [70/500] 255 leaf nodes, max depth 20 in 0.307s [71/500] 255 leaf nodes, max depth 21 in 0.307s [72/500] 255 leaf nodes, max depth 20 in 0.306s [73/500] 255 leaf nodes, max depth 19 in 0.306s [74/500] 255 leaf nodes, max depth 27 in 0.306s [75/500] 255 leaf nodes, max depth 26 in 0.306s [76/500] 255 leaf nodes, max depth 19 in 0.305s [77/500] 255 leaf nodes, max depth 24 in 0.305s [78/500] 255 leaf nodes, max depth 20 in 0.305s [79/500] 255 leaf nodes, max depth 23 in 0.304s [80/500] 255 leaf nodes, max depth 22 in 0.303s [81/500] 255 leaf nodes, max depth 22 in 0.303s [82/500] 255 leaf nodes, max depth 18 in 0.303s [83/500] 255 leaf nodes, max depth 18 in 0.303s [84/500] 255 leaf nodes, max depth 23 in 0.302s [85/500] 255 leaf nodes, max depth 22 in 0.302s [86/500] 255 leaf nodes, max depth 22 in 0.302s [87/500] 255 leaf nodes, max depth 18 in 0.301s [88/500] 255 leaf nodes, max depth 21 in 0.301s [89/500] 255 leaf nodes, max depth 23 in 0.301s [90/500] 255 leaf nodes, max depth 25 in 0.300s [91/500] 255 leaf nodes, max depth 18 in 0.300s [92/500] 255 leaf nodes, max depth 22 in 0.300s [93/500] 255 leaf nodes, max depth 30 in 0.300s [94/500] 255 leaf nodes, max depth 17 in 0.300s [95/500] 255 leaf nodes, max depth 19 in 0.299s [96/500] 255 leaf nodes, max depth 24 in 0.299s [97/500] 255 leaf nodes, max depth 19 in 0.298s [98/500] 255 leaf nodes, max depth 26 in 0.298s [99/500] 255 leaf nodes, max depth 17 in 0.298s [100/500] 255 leaf nodes, max depth 21 in 0.298s [101/500] 255 leaf nodes, max depth 21 in 0.297s [102/500] 255 leaf nodes, max depth 23 in 0.297s [103/500] 255 leaf nodes, max depth 24 in 0.297s [104/500] 255 leaf nodes, max depth 29 in 0.297s [105/500] 255 leaf nodes, max depth 21 in 0.297s [106/500] 255 leaf nodes, max depth 24 in 0.297s [107/500] 255 leaf nodes, max depth 19 in 0.296s [108/500] 255 leaf nodes, max depth 25 in 0.296s [109/500] 255 leaf nodes, max depth 19 in 0.296s [110/500] 255 leaf nodes, max depth 26 in 0.295s [111/500] 255 leaf nodes, max depth 19 in 0.295s [112/500] 255 leaf nodes, max depth 18 in 0.294s [113/500] 255 leaf nodes, max depth 26 in 0.294s [114/500] 255 leaf nodes, max depth 23 in 0.294s [115/500] 255 leaf nodes, max depth 24 in 0.293s [116/500] 255 leaf nodes, max depth 27 in 0.293s [117/500] 255 leaf nodes, max depth 19 in 0.293s [118/500] 255 leaf nodes, max depth 19 in 0.293s [119/500] 255 leaf nodes, max depth 23 in 0.292s [120/500] 255 leaf nodes, max depth 19 in 0.292s [121/500] 255 leaf nodes, max depth 19 in 0.291s [122/500] 255 leaf nodes, max depth 20 in 0.291s [123/500] 255 leaf nodes, max depth 24 in 0.291s [124/500] 255 leaf nodes, max depth 19 in 0.290s [125/500] 255 leaf nodes, max depth 20 in 0.290s [126/500] 255 leaf nodes, max depth 18 in 0.290s [127/500] 255 leaf nodes, max depth 23 in 0.290s [128/500] 255 leaf nodes, max depth 23 in 0.289s [129/500] 255 leaf nodes, max depth 16 in 0.289s [130/500] 255 leaf nodes, max depth 20 in 0.288s [131/500] 255 leaf nodes, max depth 18 in 0.288s [132/500] 255 leaf nodes, max depth 22 in 0.287s [133/500] 255 leaf nodes, max depth 32 in 0.287s [134/500] 255 leaf nodes, max depth 19 in 0.287s [135/500] 255 leaf nodes, max depth 17 in 0.286s [136/500] 255 leaf nodes, max depth 18 in 0.286s [137/500] 255 leaf nodes, max depth 20 in 0.285s [138/500] 255 leaf nodes, max depth 16 in 0.285s [139/500] 255 leaf nodes, max depth 23 in 0.284s [140/500] 255 leaf nodes, max depth 21 in 0.283s [141/500] 255 leaf nodes, max depth 23 in 0.283s [142/500] 255 leaf nodes, max depth 16 in 0.283s [143/500] 255 leaf nodes, max depth 21 in 0.282s [144/500] 255 leaf nodes, max depth 21 in 0.281s [145/500] 255 leaf nodes, max depth 21 in 0.280s [146/500] 255 leaf nodes, max depth 26 in 0.280s [147/500] 255 leaf nodes, max depth 19 in 0.279s [148/500] 255 leaf nodes, max depth 18 in 0.278s [149/500] 255 leaf nodes, max depth 16 in 0.277s [150/500] 255 leaf nodes, max depth 18 in 0.277s [151/500] 255 leaf nodes, max depth 21 in 0.276s [152/500] 255 leaf nodes, max depth 20 in 0.275s [153/500] 255 leaf nodes, max depth 21 in 0.274s [154/500] 255 leaf nodes, max depth 20 in 0.273s [155/500] 255 leaf nodes, max depth 18 in 0.273s [156/500] 255 leaf nodes, max depth 20 in 0.272s [157/500] 255 leaf nodes, max depth 22 in 0.272s [158/500] 255 leaf nodes, max depth 20 in 0.271s [159/500] 255 leaf nodes, max depth 19 in 0.270s [160/500] 255 leaf nodes, max depth 17 in 0.270s [161/500] 255 leaf nodes, max depth 16 in 0.269s [162/500] 255 leaf nodes, max depth 21 in 0.268s [163/500] 255 leaf nodes, max depth 20 in 0.268s [164/500] 255 leaf nodes, max depth 22 in 0.268s [165/500] 255 leaf nodes, max depth 18 in 0.267s [166/500] 255 leaf nodes, max depth 22 in 0.266s [167/500] 255 leaf nodes, max depth 18 in 0.266s [168/500] 255 leaf nodes, max depth 23 in 0.266s [169/500] 255 leaf nodes, max depth 21 in 0.266s [170/500] 255 leaf nodes, max depth 20 in 0.265s [171/500] 255 leaf nodes, max depth 20 in 0.264s [172/500] 255 leaf nodes, max depth 18 in 0.264s [173/500] 255 leaf nodes, max depth 20 in 0.264s [174/500] 255 leaf nodes, max depth 20 in 0.263s [175/500] 255 leaf nodes, max depth 19 in 0.264s [176/500] 255 leaf nodes, max depth 24 in 0.263s [177/500] 255 leaf nodes, max depth 22 in 0.262s [178/500] 255 leaf nodes, max depth 20 in 0.262s [179/500] 255 leaf nodes, max depth 21 in 0.262s [180/500] 255 leaf nodes, max depth 19 in 0.261s [181/500] 255 leaf nodes, max depth 22 in 0.260s [182/500] 255 leaf nodes, max depth 25 in 0.259s [183/500] 255 leaf nodes, max depth 28 in 0.258s [184/500] 255 leaf nodes, max depth 22 in 0.257s [185/500] 255 leaf nodes, max depth 20 in 0.256s [186/500] 255 leaf nodes, max depth 21 in 0.255s [187/500] 255 leaf nodes, max depth 26 in 0.254s [188/500] 255 leaf nodes, max depth 18 in 0.253s [189/500] 255 leaf nodes, max depth 25 in 0.252s [190/500] 255 leaf nodes, max depth 18 in 0.251s [191/500] 255 leaf nodes, max depth 19 in 0.251s [192/500] 255 leaf nodes, max depth 20 in 0.251s [193/500] 255 leaf nodes, max depth 27 in 0.250s [194/500] 255 leaf nodes, max depth 21 in 0.249s [195/500] 255 leaf nodes, max depth 21 in 0.249s [196/500] 255 leaf nodes, max depth 27 in 0.248s [197/500] 255 leaf nodes, max depth 21 in 0.247s [198/500] 255 leaf nodes, max depth 20 in 0.247s [199/500] 255 leaf nodes, max depth 22 in 0.247s [200/500] 255 leaf nodes, max depth 25 in 0.246s [201/500] 255 leaf nodes, max depth 19 in 0.246s [202/500] 255 leaf nodes, max depth 18 in 0.246s [203/500] 255 leaf nodes, max depth 17 in 0.245s [204/500] 255 leaf nodes, max depth 20 in 0.245s [205/500] 255 leaf nodes, max depth 17 in 0.244s [206/500] 255 leaf nodes, max depth 17 in 0.244s [207/500] 255 leaf nodes, max depth 21 in 0.244s [208/500] 255 leaf nodes, max depth 17 in 0.243s [209/500] 255 leaf nodes, max depth 18 in 0.243s [210/500] 255 leaf nodes, max depth 18 in 0.243s [211/500] 255 leaf nodes, max depth 22 in 0.243s [212/500] 255 leaf nodes, max depth 16 in 0.243s [213/500] 255 leaf nodes, max depth 18 in 0.242s [214/500] 255 leaf nodes, max depth 23 in 0.242s [215/500] 255 leaf nodes, max depth 20 in 0.242s [216/500] 255 leaf nodes, max depth 20 in 0.241s [217/500] 255 leaf nodes, max depth 19 in 0.241s [218/500] 255 leaf nodes, max depth 15 in 0.241s [219/500] 255 leaf nodes, max depth 23 in 0.240s [220/500] 255 leaf nodes, max depth 19 in 0.240s [221/500] 255 leaf nodes, max depth 24 in 0.239s [222/500] 255 leaf nodes, max depth 22 in 0.238s [223/500] 255 leaf nodes, max depth 18 in 0.238s [224/500] 255 leaf nodes, max depth 23 in 0.237s [225/500] 255 leaf nodes, max depth 21 in 0.237s [226/500] 255 leaf nodes, max depth 27 in 0.236s [227/500] 255 leaf nodes, max depth 22 in 0.235s [228/500] 255 leaf nodes, max depth 30 in 0.235s [229/500] 255 leaf nodes, max depth 18 in 0.234s [230/500] 255 leaf nodes, max depth 20 in 0.233s [231/500] 255 leaf nodes, max depth 19 in 0.233s [232/500] 255 leaf nodes, max depth 18 in 0.232s [233/500] 255 leaf nodes, max depth 17 in 0.232s [234/500] 255 leaf nodes, max depth 22 in 0.231s [235/500] 255 leaf nodes, max depth 25 in 0.230s [236/500] 255 leaf nodes, max depth 21 in 0.230s [237/500] 255 leaf nodes, max depth 26 in 0.229s [238/500] 255 leaf nodes, max depth 20 in 0.228s [239/500] 255 leaf nodes, max depth 26 in 0.228s [240/500] 255 leaf nodes, max depth 26 in 0.227s [241/500] 255 leaf nodes, max depth 21 in 0.227s [242/500] 255 leaf nodes, max depth 19 in 0.226s [243/500] 255 leaf nodes, max depth 18 in 0.226s [244/500] 255 leaf nodes, max depth 19 in 0.226s [245/500] 255 leaf nodes, max depth 21 in 0.225s [246/500] 255 leaf nodes, max depth 25 in 0.225s [247/500] 255 leaf nodes, max depth 21 in 0.224s [248/500] 255 leaf nodes, max depth 22 in 0.223s [249/500] 255 leaf nodes, max depth 27 in 0.223s [250/500] 255 leaf nodes, max depth 33 in 0.222s [251/500] 255 leaf nodes, max depth 29 in 0.222s [252/500] 255 leaf nodes, max depth 22 in 0.221s [253/500] 255 leaf nodes, max depth 23 in 0.220s [254/500] 255 leaf nodes, max depth 21 in 0.220s [255/500] 255 leaf nodes, max depth 21 in 0.219s [256/500] 255 leaf nodes, max depth 27 in 0.219s [257/500] 255 leaf nodes, max depth 24 in 0.219s [258/500] 255 leaf nodes, max depth 22 in 0.219s [259/500] 255 leaf nodes, max depth 20 in 0.218s [260/500] 255 leaf nodes, max depth 24 in 0.218s [261/500] 255 leaf nodes, max depth 24 in 0.217s [262/500] 255 leaf nodes, max depth 19 in 0.217s [263/500] 255 leaf nodes, max depth 23 in 0.217s [264/500] 255 leaf nodes, max depth 23 in 0.216s [265/500] 255 leaf nodes, max depth 21 in 0.216s [266/500] 255 leaf nodes, max depth 21 in 0.216s [267/500] 255 leaf nodes, max depth 22 in 0.215s [268/500] 255 leaf nodes, max depth 23 in 0.215s [269/500] 255 leaf nodes, max depth 24 in 0.214s [270/500] 255 leaf nodes, max depth 19 in 0.214s [271/500] 255 leaf nodes, max depth 16 in 0.213s [272/500] 255 leaf nodes, max depth 18 in 0.213s [273/500] 255 leaf nodes, max depth 17 in 0.213s [274/500] 255 leaf nodes, max depth 22 in 0.212s [275/500] 255 leaf nodes, max depth 24 in 0.212s [276/500] 255 leaf nodes, max depth 19 in 0.211s [277/500] 255 leaf nodes, max depth 24 in 0.211s [278/500] 255 leaf nodes, max depth 25 in 0.210s [279/500] 255 leaf nodes, max depth 16 in 0.210s [280/500] 255 leaf nodes, max depth 16 in 0.210s [281/500] 255 leaf nodes, max depth 18 in 0.210s [282/500] 255 leaf nodes, max depth 24 in 0.209s [283/500] 255 leaf nodes, max depth 22 in 0.209s [284/500] 255 leaf nodes, max depth 19 in 0.209s [285/500] 255 leaf nodes, max depth 27 in 0.209s [286/500] 255 leaf nodes, max depth 23 in 0.209s [287/500] 255 leaf nodes, max depth 22 in 0.208s [288/500] 255 leaf nodes, max depth 25 in 0.208s [289/500] 255 leaf nodes, max depth 21 in 0.207s [290/500] 255 leaf nodes, max depth 21 in 0.207s [291/500] 255 leaf nodes, max depth 28 in 0.207s [292/500] 255 leaf nodes, max depth 24 in 0.206s [293/500] 255 leaf nodes, max depth 23 in 0.206s [294/500] 255 leaf nodes, max depth 25 in 0.205s [295/500] 255 leaf nodes, max depth 23 in 0.205s [296/500] 255 leaf nodes, max depth 24 in 0.205s [297/500] 255 leaf nodes, max depth 24 in 0.204s [298/500] 255 leaf nodes, max depth 30 in 0.204s [299/500] 255 leaf nodes, max depth 22 in 0.203s [300/500] 255 leaf nodes, max depth 22 in 0.203s [301/500] 255 leaf nodes, max depth 20 in 0.203s [302/500] 255 leaf nodes, max depth 19 in 0.203s [303/500] 255 leaf nodes, max depth 20 in 0.203s [304/500] 255 leaf nodes, max depth 25 in 0.202s [305/500] 255 leaf nodes, max depth 24 in 0.202s [306/500] 255 leaf nodes, max depth 24 in 0.202s [307/500] 255 leaf nodes, max depth 17 in 0.201s [308/500] 255 leaf nodes, max depth 25 in 0.201s [309/500] 255 leaf nodes, max depth 20 in 0.201s [310/500] 255 leaf nodes, max depth 19 in 0.200s [311/500] 255 leaf nodes, max depth 19 in 0.200s [312/500] 255 leaf nodes, max depth 18 in 0.200s [313/500] 255 leaf nodes, max depth 22 in 0.199s [314/500] 255 leaf nodes, max depth 19 in 0.199s [315/500] 255 leaf nodes, max depth 17 in 0.199s [316/500] 255 leaf nodes, max depth 18 in 0.198s [317/500] 255 leaf nodes, max depth 17 in 0.198s [318/500] 255 leaf nodes, max depth 19 in 0.198s [319/500] 255 leaf nodes, max depth 20 in 0.198s [320/500] 255 leaf nodes, max depth 17 in 0.197s [321/500] 255 leaf nodes, max depth 17 in 0.197s [322/500] 255 leaf nodes, max depth 20 in 0.197s [323/500] 255 leaf nodes, max depth 17 in 0.197s [324/500] 255 leaf nodes, max depth 19 in 0.197s [325/500] 255 leaf nodes, max depth 22 in 0.197s [326/500] 255 leaf nodes, max depth 21 in 0.196s [327/500] 255 leaf nodes, max depth 18 in 0.196s [328/500] 255 leaf nodes, max depth 17 in 0.196s [329/500] 255 leaf nodes, max depth 18 in 0.196s [330/500] 255 leaf nodes, max depth 20 in 0.196s [331/500] 255 leaf nodes, max depth 18 in 0.195s [332/500] 255 leaf nodes, max depth 20 in 0.195s [333/500] 255 leaf nodes, max depth 20 in 0.195s [334/500] 255 leaf nodes, max depth 18 in 0.195s [335/500] 255 leaf nodes, max depth 20 in 0.194s [336/500] 255 leaf nodes, max depth 20 in 0.194s [337/500] 255 leaf nodes, max depth 22 in 0.194s [338/500] 255 leaf nodes, max depth 19 in 0.194s [339/500] 255 leaf nodes, max depth 21 in 0.193s [340/500] 255 leaf nodes, max depth 20 in 0.193s [341/500] 255 leaf nodes, max depth 18 in 0.193s [342/500] 255 leaf nodes, max depth 19 in 0.192s [343/500] 255 leaf nodes, max depth 20 in 0.192s [344/500] 255 leaf nodes, max depth 19 in 0.192s [345/500] 255 leaf nodes, max depth 21 in 0.191s [346/500] 255 leaf nodes, max depth 23 in 0.191s [347/500] 255 leaf nodes, max depth 19 in 0.191s [348/500] 255 leaf nodes, max depth 18 in 0.191s [349/500] 255 leaf nodes, max depth 18 in 0.190s [350/500] 255 leaf nodes, max depth 18 in 0.191s [351/500] 255 leaf nodes, max depth 21 in 0.191s [352/500] 255 leaf nodes, max depth 26 in 0.191s [353/500] 255 leaf nodes, max depth 28 in 0.190s [354/500] 255 leaf nodes, max depth 26 in 0.190s [355/500] 255 leaf nodes, max depth 36 in 0.190s [356/500] 255 leaf nodes, max depth 29 in 0.189s [357/500] 255 leaf nodes, max depth 32 in 0.189s [358/500] 255 leaf nodes, max depth 25 in 0.189s [359/500] 255 leaf nodes, max depth 20 in 0.189s [360/500] 255 leaf nodes, max depth 25 in 0.189s [361/500] 255 leaf nodes, max depth 23 in 0.188s [362/500] 255 leaf nodes, max depth 23 in 0.188s [363/500] 255 leaf nodes, max depth 27 in 0.188s [364/500] 255 leaf nodes, max depth 23 in 0.187s [365/500] 255 leaf nodes, max depth 23 in 0.187s [366/500] 255 leaf nodes, max depth 32 in 0.187s [367/500] 255 leaf nodes, max depth 20 in 0.187s [368/500] 255 leaf nodes, max depth 21 in 0.187s [369/500] 255 leaf nodes, max depth 20 in 0.187s [370/500] 255 leaf nodes, max depth 18 in 0.186s [371/500] 255 leaf nodes, max depth 32 in 0.186s [372/500] 255 leaf nodes, max depth 32 in 0.186s [373/500] 255 leaf nodes, max depth 21 in 0.186s [374/500] 255 leaf nodes, max depth 23 in 0.185s [375/500] 255 leaf nodes, max depth 22 in 0.185s [376/500] 255 leaf nodes, max depth 23 in 0.185s [377/500] 255 leaf nodes, max depth 25 in 0.185s [378/500] 255 leaf nodes, max depth 24 in 0.184s [379/500] 255 leaf nodes, max depth 23 in 0.184s [380/500] 255 leaf nodes, max depth 28 in 0.184s [381/500] 255 leaf nodes, max depth 21 in 0.184s [382/500] 255 leaf nodes, max depth 24 in 0.183s [383/500] 255 leaf nodes, max depth 24 in 0.183s [384/500] 255 leaf nodes, max depth 21 in 0.183s [385/500] 255 leaf nodes, max depth 24 in 0.183s [386/500] 255 leaf nodes, max depth 26 in 0.182s [387/500] 255 leaf nodes, max depth 24 in 0.182s [388/500] 255 leaf nodes, max depth 32 in 0.182s [389/500] 255 leaf nodes, max depth 29 in 0.182s [390/500] 255 leaf nodes, max depth 19 in 0.181s [391/500] 255 leaf nodes, max depth 20 in 0.181s [392/500] 255 leaf nodes, max depth 23 in 0.181s [393/500] 255 leaf nodes, max depth 25 in 0.181s [394/500] 255 leaf nodes, max depth 29 in 0.181s [395/500] 255 leaf nodes, max depth 25 in 0.180s [396/500] 255 leaf nodes, max depth 20 in 0.180s [397/500] 255 leaf nodes, max depth 23 in 0.180s [398/500] 255 leaf nodes, max depth 23 in 0.180s [399/500] 255 leaf nodes, max depth 18 in 0.180s [400/500] 255 leaf nodes, max depth 22 in 0.179s [401/500] 255 leaf nodes, max depth 19 in 0.179s [402/500] 255 leaf nodes, max depth 22 in 0.179s [403/500] 255 leaf nodes, max depth 25 in 0.179s [404/500] 255 leaf nodes, max depth 29 in 0.178s [405/500] 255 leaf nodes, max depth 25 in 0.178s [406/500] 255 leaf nodes, max depth 26 in 0.178s [407/500] 255 leaf nodes, max depth 25 in 0.178s [408/500] 255 leaf nodes, max depth 30 in 0.177s [409/500] 255 leaf nodes, max depth 30 in 0.177s [410/500] 255 leaf nodes, max depth 25 in 0.177s [411/500] 255 leaf nodes, max depth 30 in 0.177s [412/500] 255 leaf nodes, max depth 23 in 0.177s [413/500] 255 leaf nodes, max depth 24 in 0.176s [414/500] 255 leaf nodes, max depth 23 in 0.176s [415/500] 255 leaf nodes, max depth 24 in 0.176s [416/500] 255 leaf nodes, max depth 26 in 0.176s [417/500] 255 leaf nodes, max depth 27 in 0.176s [418/500] 255 leaf nodes, max depth 21 in 0.175s [419/500] 255 leaf nodes, max depth 27 in 0.175s [420/500] 255 leaf nodes, max depth 26 in 0.175s [421/500] 255 leaf nodes, max depth 28 in 0.175s [422/500] 255 leaf nodes, max depth 26 in 0.175s [423/500] 255 leaf nodes, max depth 25 in 0.174s [424/500] 255 leaf nodes, max depth 32 in 0.174s [425/500] 255 leaf nodes, max depth 25 in 0.174s [426/500] 255 leaf nodes, max depth 31 in 0.174s [427/500] 255 leaf nodes, max depth 35 in 0.173s [428/500] 255 leaf nodes, max depth 29 in 0.173s [429/500] 255 leaf nodes, max depth 28 in 0.173s [430/500] 255 leaf nodes, max depth 32 in 0.173s [431/500] 255 leaf nodes, max depth 25 in 0.173s [432/500] 255 leaf nodes, max depth 27 in 0.172s [433/500] 255 leaf nodes, max depth 31 in 0.172s [434/500] 255 leaf nodes, max depth 32 in 0.172s [435/500] 255 leaf nodes, max depth 24 in 0.172s [436/500] 255 leaf nodes, max depth 29 in 0.172s [437/500] 255 leaf nodes, max depth 25 in 0.171s [438/500] 255 leaf nodes, max depth 22 in 0.171s [439/500] 255 leaf nodes, max depth 25 in 0.171s [440/500] 255 leaf nodes, max depth 23 in 0.171s [441/500] 255 leaf nodes, max depth 21 in 0.171s [442/500] 255 leaf nodes, max depth 23 in 0.171s [443/500] 255 leaf nodes, max depth 23 in 0.170s [444/500] 255 leaf nodes, max depth 21 in 0.170s [445/500] 255 leaf nodes, max depth 34 in 0.170s [446/500] 255 leaf nodes, max depth 24 in 0.170s [447/500] 255 leaf nodes, max depth 22 in 0.170s [448/500] 255 leaf nodes, max depth 19 in 0.170s [449/500] 255 leaf nodes, max depth 26 in 0.170s [450/500] 255 leaf nodes, max depth 26 in 0.170s [451/500] 255 leaf nodes, max depth 24 in 0.169s [452/500] 255 leaf nodes, max depth 24 in 0.169s [453/500] 255 leaf nodes, max depth 21 in 0.169s [454/500] 255 leaf nodes, max depth 20 in 0.169s [455/500] 255 leaf nodes, max depth 21 in 0.169s [456/500] 255 leaf nodes, max depth 23 in 0.169s [457/500] 255 leaf nodes, max depth 20 in 0.169s [458/500] 255 leaf nodes, max depth 25 in 0.169s [459/500] 255 leaf nodes, max depth 22 in 0.168s [460/500] 255 leaf nodes, max depth 27 in 0.168s [461/500] 255 leaf nodes, max depth 22 in 0.168s [462/500] 255 leaf nodes, max depth 21 in 0.168s [463/500] 255 leaf nodes, max depth 25 in 0.168s [464/500] 255 leaf nodes, max depth 26 in 0.168s [465/500] 255 leaf nodes, max depth 24 in 0.168s [466/500] 255 leaf nodes, max depth 21 in 0.167s [467/500] 255 leaf nodes, max depth 29 in 0.167s [468/500] 255 leaf nodes, max depth 19 in 0.167s [469/500] 255 leaf nodes, max depth 17 in 0.167s [470/500] 255 leaf nodes, max depth 22 in 0.167s [471/500] 255 leaf nodes, max depth 28 in 0.167s [472/500] 255 leaf nodes, max depth 19 in 0.166s [473/500] 255 leaf nodes, max depth 20 in 0.167s [474/500] 255 leaf nodes, max depth 24 in 0.167s [475/500] 255 leaf nodes, max depth 20 in 0.166s [476/500] 255 leaf nodes, max depth 21 in 0.166s [477/500] 255 leaf nodes, max depth 18 in 0.166s [478/500] 255 leaf nodes, max depth 19 in 0.166s [479/500] 255 leaf nodes, max depth 21 in 0.166s [480/500] 255 leaf nodes, max depth 21 in 0.166s [481/500] 255 leaf nodes, max depth 21 in 0.166s [482/500] 255 leaf nodes, max depth 24 in 0.166s [483/500] 255 leaf nodes, max depth 23 in 0.166s [484/500] 255 leaf nodes, max depth 27 in 0.166s [485/500] 255 leaf nodes, max depth 20 in 0.165s [486/500] 255 leaf nodes, max depth 21 in 0.165s [487/500] 255 leaf nodes, max depth 24 in 0.165s [488/500] 255 leaf nodes, max depth 21 in 0.165s [489/500] 255 leaf nodes, max depth 22 in 0.165s [490/500] 255 leaf nodes, max depth 23 in 0.165s [491/500] 255 leaf nodes, max depth 23 in 0.165s [492/500] 255 leaf nodes, max depth 25 in 0.165s [493/500] 255 leaf nodes, max depth 20 in 0.165s [494/500] 255 leaf nodes, max depth 22 in 0.164s [495/500] 255 leaf nodes, max depth 27 in 0.164s [496/500] 255 leaf nodes, max depth 21 in 0.164s [497/500] 255 leaf nodes, max depth 21 in 0.164s [498/500] 255 leaf nodes, max depth 26 in 0.164s [499/500] 255 leaf nodes, max depth 24 in 0.164s [500/500] 255 leaf nodes, max depth 23 in 0.164s Fit 500 trees in 83.011 s, (127500 total leaf nodes) Time spent finding best splits: 57.967s Time spent applying splits: 11.217s Time spent predicting: 4.851s done in 83.022s, ROC AUC: 0.8156 Threading layer chosen: tbb ```

Also, I tried to reproduce the leak with a minimal example and this is what I came up with. It's pretty weird and I'd like to know if you can reproduce it before submitting it to numba because I feel like I'm tripping:

```python import numpy as np import psutil from numba import (njit, jitclass, prange, float32, uint8, uint32, typeof, optional) @jitclass([ ('attr', uint32), ]) class JitClass: def __init__(self): self.attr = 3 @njit def f(): cs = [JitClass() for i in range(1000)] array = np.empty(shape=10, dtype=np.uint32) # If I remove this loop, there is no leak for i, c in enumerate(cs): c.attr[i] = array[i] # this should not even pass! # c.attr = array[i] # <-- leak still here if we do this instead. # this should not work either! something_that_should_not_compile blahblahblah why_is_this_passing # a = a + 3 <-- this produces an error, as expected (a is not defined) return array class C: def g(self): self.array = f() p = psutil.Process() for _ in range(10000): o = C() o.g() del o # leak proportional to the size of cs, independent to the size of array print(f"{p.memory_info().rss / 1e6} MB") ```

@ogrisel I'm happy to merge as is, but I'd like your input on the 1e7 case first.

ogrisel commented 6 years ago

Cool, let's merge as it's already a net improvement.

About the minimal reproduction case, I confirm I get the leak with your code, without any error message or exception. Just memory usage increasing as reported by psutil.

ogrisel commented 6 years ago

We still have a discrepancy in terms of results with LightGBM though. But there is another issue for that.

NicolasHug commented 6 years ago

I have opened https://github.com/numba/numba/issues/3473 and https://github.com/numba/numba/issues/3472 regarding the leak and some other weird stuff I found.

ogrisel / pygbm

[WIP] Do not store histogram on SplitInfo #36

Codecov Report