microsoft / lightgbm-benchmark

Benchmark tools for LightGBM
MIT License
14 stars 7 forks source link

Generate bigger synthetic data using per-batch generation [regression only] #211

Closed jfomhover closed 2 years ago

jfomhover commented 2 years ago

This implements a synthetic data generator not constrained by the memory limit (but still constrained by disk).

This works by creating a synthetic data generator that can produce batches of random data. This generator is being iterated on to create the required amount of data for training, testing and inferencing and append all batched sequentially.

This is still limited by disk allocation for now.

github-actions[bot] commented 2 years ago

Unit Test Results for Build

  1 files    1 suites   1m 6s :stopwatch: 97 tests 97 :heavy_check_mark: 0 :zzz: 0 :x:

Results for commit 99b87e13.

:recycle: This comment has been updated with latest results.

github-actions[bot] commented 2 years ago

Code Coverage

Package Line Rate Branch Rate Complexity
common 88% 0% 0
pipelines.azureml 83% 0% 0
scripts 100% 0% 0
scripts.data_processing.generate_data 93% 0% 0
scripts.data_processing.lightgbm_data2bin 95% 0% 0
scripts.data_processing.partition_data 92% 0% 0
scripts.inferencing.custom_win_cli 94% 0% 0
scripts.inferencing.lightgbm_c_api 75% 0% 0
scripts.inferencing.lightgbm_python 95% 0% 0
scripts.inferencing.treelite_python 94% 0% 0
scripts.model_transformation.treelite_compile 92% 0% 0
scripts.sample 93% 0% 0
scripts.training.lightgbm_python 80% 0% 0
Summary 87% (1516 / 1733) 0% (0 / 0) 0