rapidsai / gpu-bdb

RAPIDS GPU-BDB
Apache License 2.0
107 stars 44 forks source link

[CPU] ML Portion for GPU-BDB Queries #248

Open VibhuJawa opened 2 years ago

VibhuJawa commented 2 years ago

Below queries rely on cuML models from for ML GPU . Depending on the performance we need to decide b/w Distributed (dask-ml) vs non distributed (sklearn) implementation for the ML portion of these queries. I suggest benchmarking both and then choosing the one that gives the best performance.

Query-05 GPU:cuml.LogisticRegression

  1. Non Distributed CPU: sklearn.linear_model.LogisticRegression.LogisticRegression
  2. Distributed CPU: dask_ml.linear_model.LogisticRegression

Query-20 GPU: cuml.cluster.kmeans

  1. CPU: sklearn.cluster.KMeans
  2. Distributed CPU: dask_ml.cluster.Kmeans

Query-25 GPU: cuml.cluster.kmeans

  1. CPU: sklearn.cluster.KMeans
  2. Distributed CPU: dask_ml.cluster.Kmeans

Query-26 GPU: cuml.cluster.kmeans

  1. CPU: sklearn.cluster.KMeans
  2. Distributed CPU: dask_ml.cluster.Kmeans

Query 28 GPUcuml.dask.naive_bayes

  1. Distributed CPU CPU Equivalent dask_ml.naive_bayes

    CC: @DaceT , @randerzander

    Related PRS:

    https://github.com/rapidsai/gpu-bdb/pull/243

    https://github.com/rapidsai/gpu-bdb/pull/244

ChrisJar commented 2 years ago
For query 5 it appears that using sklearn as a direct replacement for cuml is slightly faster than adjusting to use dask-ml: Run Sklearn Dask-ml
1 1731.960032 1976.639929
2 1713.143504 1890.307189
3 1692.447222 1819.198046
4 1679.160072 1800.853525
5 1663.727669 1791.983971
Avg 1696.0877 1855.796532
Edit: Here are times running on a dgx-2 Run Sklearn Dask-ml
1 605.7754374 712.7177153
2 609.4057972 703.8873169
3 592.3652494 705.2219992
4 589.4770317 704.7177913
5 589.8500378 698.2876835
Avg 597.3747107 704.9665012
Edit 2: Here are times running on 2 dgx-1s (TCP) Run Sklearn Dask-ml
1 865.8754275 984.3859689
2 833.6778433 968.5142105
3 814.666688 939.6765635
4 823.4441831 925.5529888
5 806.8892348 929.7718291
Avg 828.9106753 949.5803122
VibhuJawa commented 2 years ago

@ChrisJar , Thanks for sharing these benchmarks. Do you have thoughts on how this can change if we scale to 10K. Not saying we should prioritize that, just wondering if you have any thoughts on that front ?