szilard / GBM-perf

Performance of various open source GBM implementations
MIT License
213 stars 28 forks source link

TensorFlow Decision Forests #53

Open szilard opened 3 years ago

szilard commented 3 years ago

https://blog.tensorflow.org/2021/05/introducing-tensorflow-decision-forests.html

szilard commented 3 years ago
docker run --rm  -ti continuumio/anaconda3 /bin/bash

pip install tensorflow_decision_forests

ipython
szilard commented 3 years ago
import tensorflow_decision_forests as tfdf

import numpy as np
import pandas as pd
import tensorflow as tf

from sklearn import metrics

d_train = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/train-1m.csv")
d_test = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/test.csv")

d_train["dep_delayed_15min"] = np.where(d_train["dep_delayed_15min"]=="Y",1,0)
d_test["dep_delayed_15min"] = np.where(d_test["dep_delayed_15min"]=="Y",1,0)

dtf_train = tfdf.keras.pd_dataframe_to_tf_dataset(d_train, label="dep_delayed_15min")
dtf_test = tfdf.keras.pd_dataframe_to_tf_dataset(d_test, label="dep_delayed_15min")

md = tfdf.keras.GradientBoostedTreesModel(max_depth=10, num_trees=100, shrinkage=0.1)
%time md.fit(x=dtf_train)

y_pred = md.predict(dtf_test)   
print(metrics.roc_auc_score(d_test["dep_delayed_15min"], y_pred))
szilard commented 3 years ago

m5.2xlarge (8 cores)

In [1]: import tensorflow_decision_forests as tfdf
2021-06-04 16:39:24.583254: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-06-04 16:39:24.583295: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

In [2]:

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: import tensorflow as tf

In [5]:

In [5]: from sklearn import metrics

In [6]:

In [6]:

In [6]: d_train = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/train-1m.csv")

In [7]: d_test = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/test.csv")

In [8]:

In [8]: d_train["dep_delayed_15min"] = np.where(d_train["dep_delayed_15min"]=="Y",1,0)

In [9]: d_test["dep_delayed_15min"] = np.where(d_test["dep_delayed_15min"]=="Y",1,0)

In [10]:

In [10]:

In [10]: dtf_train = tfdf.keras.pd_dataframe_to_tf_dataset(d_train, label="dep_delayed_15min")
2021-06-04 16:39:32.461417: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-06-04 16:39:32.461464: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-06-04 16:39:32.461493: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (78cd809fe258): /proc/driver/nvidia/version does not exist
2021-06-04 16:39:32.461787: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

In [11]: dtf_test = tfdf.keras.pd_dataframe_to_tf_dataset(d_test, label="dep_delayed_15min")

In [12]:

In [12]:

In [12]: md = tfdf.keras.GradientBoostedTreesModel(max_depth=10, num_trees=100, shrinkage=0.1)

In [13]: %time md.fit(x=dtf_train)
2021-06-04 16:39:36.183058: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-06-04 16:39:36.204576: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2499980000 Hz
15625/15625 [==============================] - 15s 780us/step
[INFO kernel.cc:746] Start Yggdrasil model training
[INFO kernel.cc:747] Collect training examples
[INFO kernel.cc:392] Number of batches: 15625
[INFO kernel.cc:393] Number of examples: 1000000
[INFO data_spec_inference.cc:289] 3 item(s) have been pruned (i.e. they are considered out of dictionary) for the column Dest (289 item(s) left) because min_value_count=5 and max_number_of_unique_values=2000
[INFO data_spec_inference.cc:289] 2 item(s) have been pruned (i.e. they are considered out of dictionary) for the column Origin (289 item(s) left) because min_value_count=5 and max_number_of_unique_values=2000
[INFO kernel.cc:769] Dataset:
Number of records: 1000000
Number of columns: 9

Number of columns by type:
        CATEGORICAL: 7 (77.7778%)
        NUMERICAL: 2 (22.2222%)

Columns:

CATEGORICAL: 7 (77.7778%)
        0: "DayOfWeek" CATEGORICAL has-dict vocab-size:8 zero-ood-items most-frequent:"c-5" 147674 (14.7674%)
        1: "DayofMonth" CATEGORICAL has-dict vocab-size:32 zero-ood-items most-frequent:"c-17" 33733 (3.3733%)
        3: "Dest" CATEGORICAL has-dict vocab-size:290 num-oods:3 (0.0003%) most-frequent:"ATL" 58247 (5.8247%)
        5: "Month" CATEGORICAL has-dict vocab-size:13 zero-ood-items most-frequent:"c-8" 88344 (8.8344%)
        6: "Origin" CATEGORICAL has-dict vocab-size:290 num-oods:2 (0.0002%) most-frequent:"ATL" 58796 (5.8796%)
        7: "UniqueCarrier" CATEGORICAL has-dict vocab-size:23 zero-ood-items most-frequent:"WN" 150937 (15.0937%)
        8: "__LABEL" CATEGORICAL integerized vocab-size:3 no-ood-item

NUMERICAL: 2 (22.2222%)
        2: "DepTime" NUMERICAL mean:1343.12 min:1 max:2615 sd:476.663
        4: "Distance" NUMERICAL mean:728.805 min:21 max:4962 sd:574.475

Terminology:
        nas: Number of non-available (i.e. missing) values.
        ood: Out of dictionary.
        manually-defined: Attribute which type is manually defined by the user i.e. the type was not automatically inferred.
        tokenized: The attribute value is obtained through tokenization.
        has-dict: The attribute is attached to a string dictionary e.g. a categorical attribute stored as a string.
        vocab-size: Number of unique values.

[INFO kernel.cc:772] Configure learner
[WARNING gradient_boosted_trees.cc:1532] Subsample hyperparameter given but sampling method does not match.
[WARNING gradient_boosted_trees.cc:1545] GOSS alpha hyperparameter given but GOSS is disabled.
[WARNING gradient_boosted_trees.cc:1554] GOSS beta hyperparameter given but GOSS is disabled.
[WARNING gradient_boosted_trees.cc:1566] SelGB ratio hyperparameter given but SelGB is disabled.
[INFO kernel.cc:797] Training config:
learner: "GRADIENT_BOOSTED_TREES"
features: "DayOfWeek"
features: "DayofMonth"
features: "DepTime"
features: "Dest"
features: "Distance"
features: "Month"
features: "Origin"
features: "UniqueCarrier"
label: "__LABEL"
task: CLASSIFICATION
[yggdrasil_decision_forests.model.gradient_boosted_trees.proto.gradient_boosted_trees_config] {
  num_trees: 100
  decision_tree {
    max_depth: 10
    min_examples: 5
    in_split_min_examples_check: true
    missing_value_policy: GLOBAL_IMPUTATION
    allow_na_conditions: false
    categorical_set_greedy_forward {
      sampling: 0.1
      max_num_items: -1
      min_item_frequency: 1
    }
    growing_strategy_local {
    }
    categorical {
      cart {
      }
    }
    num_candidate_attributes_ratio: -1
    axis_aligned_split {
    }
  }
  shrinkage: 0.1
  validation_set_ratio: 0.1
  early_stopping: VALIDATION_LOSS_INCREASE
  early_stopping_num_trees_look_ahead: 30
  l2_regularization: 0
  lambda_loss: 1
  mart {
  }
  adapt_subsample_for_maximum_training_duration: false
  l1_regularization: 0
  use_hessian_gain: false
  l2_regularization_categorical: 1
}

[INFO kernel.cc:800] Deployment config:

[INFO kernel.cc:837] Train model
[INFO gradient_boosted_trees.cc:480] Default loss set to BINOMIAL_LOG_LIKELIHOOD
[INFO gradient_boosted_trees.cc:1358]   num-trees:1 train-loss:0.952696 train-accuracy:0.806957 valid-loss:0.954296 valid-accuracy:0.807567
[INFO gradient_boosted_trees.cc:1360]   num-trees:2 train-loss:0.930766 train-accuracy:0.806957 valid-loss:0.935331 valid-accuracy:0.807567
[INFO gradient_boosted_trees.cc:1360]   num-trees:28 train-loss:0.759611 train-accuracy:0.837750 valid-loss:0.817667 valid-accuracy:0.827559
[INFO gradient_boosted_trees.cc:1360]   num-trees:56 train-loss:0.695891 train-accuracy:0.852399 valid-loss:0.791661 valid-accuracy:0.834203
[INFO gradient_boosted_trees.cc:1360]   num-trees:86 train-loss:0.650648 train-accuracy:0.864072 valid-loss:0.772629 valid-accuracy:0.838609
[INFO gradient_boosted_trees.cc:1358]   num-trees:100 train-loss:0.633199 train-accuracy:0.868094 valid-loss:0.766061 valid-accuracy:0.839678
[INFO gradient_boosted_trees.cc:319] Truncates the model to 100 tree(s) i.e. 100  iteration(s).
[INFO gradient_boosted_trees.cc:348] Final model valid-loss:0.766061 valid-accuracy:0.839678
[INFO kernel.cc:856] Export model in log directory: /tmp/tmpkhka91x3
[INFO kernel.cc:864] Save model in resources
[INFO kernel.cc:929] Loading model from path
[INFO decision_forest.cc:590] Model loaded with 100 root(s), 93196 node(s), and 8 input feature(s).
[INFO abstract_model.cc:876] Engine "GradientBoostedTreesGeneric" built
[INFO kernel.cc:797] Use fast generic engine
CPU times: user 3min 55s, sys: 6.88 s, total: 4min 2s
Wall time: 2min 6s
Out[13]: <tensorflow.python.keras.callbacks.History at 0x7f88aaaca0d0>

In [14]:

In [14]: y_pred = md.predict(dtf_test)

In [15]: print(metrics.roc_auc_score(d_test["dep_delayed_15min"], y_pred))
0.7612733258837148
szilard commented 3 years ago

Summary:

m5.2xlarge (8 cores)

Wall time: 2min 6s

In [15]: print(metrics.roc_auc_score(d_test["dep_delayed_15min"], y_pred))
0.7612733258837148
szilard commented 3 years ago

In comparison XGBoost (m5.2xlarge):

5.696 (time)
0.7478858 (AUC)

(20x faster)

szilard commented 3 years ago

GPU:

p3.2xlarge

nvidia-docker run -it --rm tensorflow/tensorflow:latest-gpu-jupyter bash

pip install tensorflow_decision_forests sklearn

ipython
In [1]: import tensorflow_decision_forests as tfdf
2021-06-04 19:08:30.923089: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

In [2]:

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: import tensorflow as tf

In [5]:

In [5]: from sklearn import metrics

In [6]:

In [6]:

In [6]: d_train = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/train-1m.csv")

In [7]: d_test = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/test.csv")

In [8]:

In [8]: d_train["dep_delayed_15min"] = np.where(d_train["dep_delayed_15min"]=="Y",1,0)

In [9]: d_test["dep_delayed_15min"] = np.where(d_test["dep_delayed_15min"]=="Y",1,0)

In [10]:

In [10]:

In [10]: dtf_train = tfdf.keras.pd_dataframe_to_tf_dataset(d_train, label="dep_delayed_15min")
2021-06-04 19:08:40.281591: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-06-04 19:08:41.264152: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:41.265175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:00:1e.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-06-04 19:08:41.265220: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-06-04 19:08:41.268516: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-06-04 19:08:41.268583: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-06-04 19:08:41.269670: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-06-04 19:08:41.269984: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-06-04 19:08:41.270925: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-06-04 19:08:41.271691: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-06-04 19:08:41.271938: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-06-04 19:08:41.272066: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:41.273113: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:41.274059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-06-04 19:08:41.274467: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-04 19:08:41.275021: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:41.275996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:00:1e.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-06-04 19:08:41.276119: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:41.277162: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:41.278101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-06-04 19:08:41.278156: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-06-04 19:08:42.672460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-04 19:08:42.672513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0
2021-06-04 19:08:42.672524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N
2021-06-04 19:08:42.672786: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:42.673838: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:42.674860: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:42.675833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14644 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0)

In [11]: dtf_test = tfdf.keras.pd_dataframe_to_tf_dataset(d_test, label="dep_delayed_15min")

In [12]:

In [12]: md = tfdf.keras.GradientBoostedTreesModel(max_depth=10, num_trees=100, shrinkage=0.1)

In [13]: %time md.fit(x=dtf_train)
2021-06-04 19:09:28.430384: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-06-04 19:09:28.452532: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2300020000 Hz
15625/15625 [==============================] - 27s 1ms/step
[INFO kernel.cc:746] Start Yggdrasil model training
[INFO kernel.cc:747] Collect training examples
[INFO kernel.cc:392] Number of batches: 15625
[INFO kernel.cc:393] Number of examples: 1000000
[INFO data_spec_inference.cc:289] 3 item(s) have been pruned (i.e. they are considered out of dictionary) for the column Dest (289 item(s) left) because min_value_count=5 and max_number_of_unique_values=2000
[INFO data_spec_inference.cc:289] 2 item(s) have been pruned (i.e. they are considered out of dictionary) for the column Origin (289 item(s) left) because min_value_count=5 and max_number_of_unique_values=2000
[INFO kernel.cc:769] Dataset:
Number of records: 1000000
Number of columns: 9

Number of columns by type:
        CATEGORICAL: 7 (77.7778%)
        NUMERICAL: 2 (22.2222%)

Columns:

CATEGORICAL: 7 (77.7778%)
        0: "DayOfWeek" CATEGORICAL has-dict vocab-size:8 zero-ood-items most-frequent:"c-5" 147674 (14.7674%)
        1: "DayofMonth" CATEGORICAL has-dict vocab-size:32 zero-ood-items most-frequent:"c-17" 33733 (3.3733%)
        3: "Dest" CATEGORICAL has-dict vocab-size:290 num-oods:3 (0.0003%) most-frequent:"ATL" 58247 (5.8247%)
        5: "Month" CATEGORICAL has-dict vocab-size:13 zero-ood-items most-frequent:"c-8" 88344 (8.8344%)
        6: "Origin" CATEGORICAL has-dict vocab-size:290 num-oods:2 (0.0002%) most-frequent:"ATL" 58796 (5.8796%)
        7: "UniqueCarrier" CATEGORICAL has-dict vocab-size:23 zero-ood-items most-frequent:"WN" 150937 (15.0937%)
        8: "__LABEL" CATEGORICAL integerized vocab-size:3 no-ood-item

NUMERICAL: 2 (22.2222%)
        2: "DepTime" NUMERICAL mean:1343.12 min:1 max:2615 sd:476.663
        4: "Distance" NUMERICAL mean:728.805 min:21 max:4962 sd:574.475

Terminology:
        nas: Number of non-available (i.e. missing) values.
        ood: Out of dictionary.
        manually-defined: Attribute which type is manually defined by the user i.e. the type was not automatically inferred.
        tokenized: The attribute value is obtained through tokenization.
        has-dict: The attribute is attached to a string dictionary e.g. a categorical attribute stored as a string.
        vocab-size: Number of unique values.

[INFO kernel.cc:772] Configure learner
[WARNING gradient_boosted_trees.cc:1532] Subsample hyperparameter given but sampling method does not match.
[WARNING gradient_boosted_trees.cc:1545] GOSS alpha hyperparameter given but GOSS is disabled.
[WARNING gradient_boosted_trees.cc:1554] GOSS beta hyperparameter given but GOSS is disabled.
[WARNING gradient_boosted_trees.cc:1566] SelGB ratio hyperparameter given but SelGB is disabled.
[INFO kernel.cc:797] Training config:
learner: "GRADIENT_BOOSTED_TREES"
features: "DayOfWeek"
features: "DayofMonth"
features: "DepTime"
features: "Dest"
features: "Distance"
features: "Month"
features: "Origin"
features: "UniqueCarrier"
label: "__LABEL"
task: CLASSIFICATION
[yggdrasil_decision_forests.model.gradient_boosted_trees.proto.gradient_boosted_trees_config] {
  num_trees: 100
  decision_tree {
    max_depth: 10
    min_examples: 5
    in_split_min_examples_check: true
    missing_value_policy: GLOBAL_IMPUTATION
    allow_na_conditions: false
    categorical_set_greedy_forward {
      sampling: 0.1
      max_num_items: -1
      min_item_frequency: 1
    }
    growing_strategy_local {
    }
    categorical {
      cart {
      }
    }
    num_candidate_attributes_ratio: -1
    axis_aligned_split {
    }
  }
  shrinkage: 0.1
  validation_set_ratio: 0.1
  early_stopping: VALIDATION_LOSS_INCREASE
  early_stopping_num_trees_look_ahead: 30
  l2_regularization: 0
  lambda_loss: 1
  mart {
  }
  adapt_subsample_for_maximum_training_duration: false
  l1_regularization: 0
  use_hessian_gain: false
  l2_regularization_categorical: 1
}

[INFO kernel.cc:800] Deployment config:

[INFO kernel.cc:837] Train model
[INFO gradient_boosted_trees.cc:480] Default loss set to BINOMIAL_LOG_LIKELIHOOD
[INFO gradient_boosted_trees.cc:1358]   num-trees:1 train-loss:0.952696 train-accuracy:0.806957 valid-loss:0.954296 valid-accuracy:0.807567
[INFO gradient_boosted_trees.cc:1360]   num-trees:2 train-loss:0.930766 train-accuracy:0.806957 valid-loss:0.935331 valid-accuracy:0.807567
[INFO gradient_boosted_trees.cc:1360]   num-trees:28 train-loss:0.759611 train-accuracy:0.837750 valid-loss:0.817667 valid-accuracy:0.827559
[INFO gradient_boosted_trees.cc:1360]   num-trees:55 train-loss:0.697795 train-accuracy:0.851977 valid-loss:0.792125 valid-accuracy:0.833853
[INFO gradient_boosted_trees.cc:1360]   num-trees:83 train-loss:0.655071 train-accuracy:0.862715 valid-loss:0.774906 valid-accuracy:0.837899
[INFO gradient_boosted_trees.cc:1358]   num-trees:100 train-loss:0.633199 train-accuracy:0.868094 valid-loss:0.766061 valid-accuracy:0.839678
[INFO gradient_boosted_trees.cc:319] Truncates the model to 100 tree(s) i.e. 100  iteration(s).
[INFO gradient_boosted_trees.cc:348] Final model valid-loss:0.766061 valid-accuracy:0.839678
[INFO kernel.cc:856] Export model in log directory: /tmp/tmp4a7ekm_n
[INFO kernel.cc:864] Save model in resources
[INFO kernel.cc:929] Loading model from path
[INFO decision_forest.cc:590] Model loaded with 100 root(s), 93196 node(s), and 8 input feature(s).
[INFO abstract_model.cc:876] Engine "GradientBoostedTreesGeneric" built
[INFO kernel.cc:797] Use fast generic engine
CPU times: user 4min 41s, sys: 8.69 s, total: 4min 50s
Wall time: 2min 22s
Out[13]: <tensorflow.python.keras.callbacks.History at 0x7f6777917048>
szilard commented 3 years ago

Not using GPU?

dtf_train = tfdf.keras.pd_dataframe_to_tf_dataset(d_train, label="dep_delayed_15min") creates something on GPU:

[0] Tesla V100-SXM2-16GB | 36'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 36'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 36'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 35'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 36'C,   0 % |   465 / 16160 MB | root(463M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % |   465 / 16160 MB | root(463M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % |   465 / 16160 MB | root(463M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % |   465 / 16160 MB | root(463M)

then md.fit(x=dtf_train)

[0] Tesla V100-SXM2-16GB | 37'C, 0 % | 465 / 16160 MB | root(463M) [0] Tesla V100-SXM2-16GB | 37'C, 0 % | 465 / 16160 MB | root(463M) [0] Tesla V100-SXM2-16GB | 37'C, 0 % | 465 / 16160 MB | root(463M) [0] Tesla V100-SXM2-16GB | 37'C, 0 % | 15111 / 16160 MB | root(15109M) [0] Tesla V100-SXM2-16GB | 37'C, 0 % | 15111 / 16160 MB | root(15109M) [0] Tesla V100-SXM2-16GB | 37'C, 0 % | 15111 / 16160 MB | root(15109M) [0] Tesla V100-SXM2-16GB | 37'C, 0 % | 15111 / 16160 MB | root(15109M) [0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M) [0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M) [0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M) [0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M) [0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M) [0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M) [0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M) [0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M) [0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M)

starts something (calculation of stats etc.) but then when trees are started to be built, not using GPU anymore:

[INFO gradient_boosted_trees.cc:1358]   num-trees:1 train-loss:0.952696 train-accuracy:0.806957 valid-loss:0.954296 valid-accuracy:0.807567
[INFO gradient_boosted_trees.cc:1360]   num-trees:2 train-loss:0.930766 train-accuracy:0.806957 valid-loss:0.935331 valid-accuracy:0.807567
[0] Tesla V100-SXM2-16GB | 37'C,   2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C,   2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C,   2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C,   2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % | 15111 / 16160 MB | root(15109M)

Screen Shot 2021-06-04 at 12 21 18 PM

Laurae2 commented 3 years ago

Seems GPU is not supported:

szilard commented 3 years ago

Yeah, I was about to post that, quite hilarious.

szilard commented 3 years ago

Added early_stopping="NONE" to prevent early stopping for small data size:

import tensorflow_decision_forests as tfdf

import numpy as np
import pandas as pd
import tensorflow as tf

from sklearn import metrics

d_train = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/train-0.1m.csv")
d_test = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/test.csv")

d_train["dep_delayed_15min"] = np.where(d_train["dep_delayed_15min"]=="Y",1,0)
d_test["dep_delayed_15min"] = np.where(d_test["dep_delayed_15min"]=="Y",1,0)

dtf_train = tfdf.keras.pd_dataframe_to_tf_dataset(d_train, label="dep_delayed_15min")
dtf_test = tfdf.keras.pd_dataframe_to_tf_dataset(d_test, label="dep_delayed_15min")

md = tfdf.keras.GradientBoostedTreesModel(max_depth=10, num_trees=100, shrinkage=0.1, early_stopping="NONE")
%time md.fit(x=dtf_train)

y_pred = md.predict(dtf_test)   
print(metrics.roc_auc_score(d_test["dep_delayed_15min"], y_pred))
szilard commented 3 years ago

m5.4xlarge (16 cores)

TF-DF:

size time [s] AUC
100K 16 0.704
1M 110 0.761
10M 1400 0.774

XGBoost:

size time [s] AUC
100K 0.6 0.734
1M 3.5 0.748
10M 35 0.754

LightGBM:

size time [s] AUC
100K 2 0.717
1M 4 0.765
10M 20 0.792

How much slower:

size TF-DF/XGBoost TF-DF/LightGBM
100K 25x 8x
1M 30x 27x
10M 40x 70x