Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.26k
stars
1.23k
forks
source link
AutoML XGBoost model abnormal memory usage as the size of dataset grows from 10w to 10M #7716
Test AutoML XGBoost Classifier example in Almaren Yarn Cluster(cluster mode), with sparse datasets from 100,000 rows(0.7GB) to 10 million rows (72GB) generated by scripts. Found that the memory usage is abnormally scale up as the size of dataset grows Corresponding test results are as following:
Otherwise, the application report following error:
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m 2023-02-28 20:16:58,965 ERROR function_runner.py:268 -- Runner Thread raised error.
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m Traceback (most recent call last):
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/ray/tune/function_runner.py", line 262, in run
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m self._entrypoint()
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/ray/tune/function_runner.py", line 331, in entrypoint
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m self._status_reporter.get_checkpoint())
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk0/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000001/environment/lib/python3.7/site-packages/ray/util/tracing/tracing_helper.py", line 451, in _resume_span
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk0/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000001/environment/lib/python3.7/site-packages/ray/tune/function_runner.py", line 597, in _trainable_func
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk0/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000001/environment/lib/python3.7/site-packages/bigdl/orca/automl/search/ray_tune/ray_tune_search_engine.py", line 352, in train_func
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/bigdl/orca/automl/xgboost/XGBoost.py", line 158, in fit_eval
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m self.model.fit(x, y, eval_set=eval_set, eval_metric=metric_name)
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 575, in inner_f
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m return f(**kwargs)
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/sklearn.py", line 1397, in fit
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m enable_categorical=self.enable_categorical,
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/sklearn.py", line 457, in _wrap_evaluation_matrices
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m enable_categorical=enable_categorical,
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/sklearn.py", line 1396, in <lambda>
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m create_dmatrix=lambda **kwargs: DMatrix(nthread=self.n_jobs, **kwargs),
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 575, in inner_f
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m return f(**kwargs)
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 692, in __init__
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m enable_categorical=enable_categorical,
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/data.py", line 881, in dispatch_data_backend
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m feature_types)
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/data.py", line 187, in _from_numpy_array
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m ctypes.byref(handle),
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 246, in _check_call
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m raise XGBoostError(py_str(_LIB.XGBGetLastError()))
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m xgboost.core.XGBoostError: std::bad_alloc
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m Exception in thread Thread-2:
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m Traceback (most recent call last):
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/threading.py", line 926, in _bootstrap_inner
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m self.run()
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/ray/tune/function_runner.py", line 281, in run
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m raise e
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/ray/tune/function_runner.py", line 262, in run
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m self._entrypoint()
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/ray/tune/function_runner.py", line 331, in entrypoint
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m self._status_reporter.get_checkpoint())
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk0/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000001/environment/lib/python3.7/site-packages/ray/util/tracing/tracing_helper.py", line 451, in _resume_span
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk0/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000001/environment/lib/python3.7/site-packages/ray/tune/function_runner.py", line 597, in _trainable_func
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk0/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000001/environment/lib/python3.7/site-packages/bigdl/orca/automl/search/ray_tune/ray_tune_search_engine.py", line 352, in train_func
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/bigdl/orca/automl/xgboost/XGBoost.py", line 158, in fit_eval
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m self.model.fit(x, y, eval_set=eval_set, eval_metric=metric_name)
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 575, in inner_f
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m return f(**kwargs)
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/sklearn.py", line 1397, in fit
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m enable_categorical=self.enable_categorical,
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/sklearn.py", line 457, in _wrap_evaluation_matrices
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m enable_categorical=enable_categorical,
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/sklearn.py", line 1396, in <lambda>
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m create_dmatrix=lambda **kwargs: DMatrix(nthread=self.n_jobs, **kwargs),
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 575, in inner_f
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m return f(**kwargs)
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 692, in __init__
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m enable_categorical=enable_categorical,
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/data.py", line 881, in dispatch_data_backend
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m feature_types)
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/data.py", line 187, in _from_numpy_array
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m ctypes.byref(handle),
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 246, in _check_call
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m raise XGBoostError(py_str(_LIB.XGBGetLastError()))
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m xgboost.core.XGBoostError: std::bad_alloc
[2m[36m(ImplicitFunc pid=18420, ip=172.16.0.135)[0m
Test AutoML XGBoost Classifier example in Almaren Yarn Cluster(cluster mode), with sparse datasets from 100,000 rows(0.7GB) to 10 million rows (72GB) generated by scripts. Found that the memory usage is abnormally scale up as the size of dataset grows Corresponding test results are as following:
Otherwise, the application report following error: