h2o4gpu :Genetic algorithm along with Random Forest Regression produces error: terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: out of memory #789
I am working on feature selection using Genetic Algorithm (GA) with Random forest regression model (h2o4gpu.RandomForest Regressor). The number of estimators is 100, rest of the parameters are default. Here, the fitness function for GA is RF model's MAE. My dataset is 1.51 MB and dimension is 4000*44. However, The following is the types of error i get after certain iterations (say 30-40) whenever i run the program:
terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: out of memory
Aborted (core dumped)
I am working on feature selection using Genetic Algorithm (GA) with Random forest regression model (h2o4gpu.RandomForest Regressor). The number of estimators is 100, rest of the parameters are default. Here, the fitness function for GA is RF model's MAE. My dataset is 1.51 MB and dimension is 4000*44. However, The following is the types of error i get after certain iterations (say 30-40) whenever i run the program:
terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: out of memory Aborted (core dumped)
terminate called after throwing an instance of 'dmlc::Error' what(): [08:58:38] /workspace/include/xgboost/./../../src/common/common.h:41: /workspace/src/tree/../common/device_helpers.cuh: 422: out of memory Stack trace: [bt] (0) /conda/envs/rapids/xgboost/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x24) [0x7f3f0b07fcb4] [bt] (1) /conda/envs/rapids/xgboost/libxgboost.so(+0x3267e2) [0x7f3f0b2a57e2] [bt] (2) /conda/envs/rapids/xgboost/libxgboost.so(xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal >::EvaluateSplits(std::vector<int, std::allocator >, xgboost::RegTree const&, unsigned long)+0x1041) [0x7f3f0b2b48b1]
[bt] (3) /conda/envs/rapids/xgboost/libxgboost.so(xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::DMatrix, xgboost::RegTree, dh::AllReducer)+0x131e) [0x7f3f0b2c7dfe]
[bt] (4) /conda/envs/rapids/xgboost/libxgboost.so(+0x34a201) [0x7f3f0b2c9201]
[bt] (5) /conda/envs/rapids/bin/../lib/libgomp.so.1(GOMP_parallel+0x42) [0x7f3f1c5bee92]
[bt] (6) /conda/envs/rapids/xgboost/libxgboost.so(xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal >::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::DMatrix, std::vector<xgboost::RegTree, std::allocator<xgboost::RegTree> > const&)+0x918) [0x7f3f0b2bae98]
[bt] (7) /conda/envs/rapids/xgboost/libxgboost.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::DMatrix, int, std::vector<std::unique_ptr<xgboost::RegTree, std::default_delete >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_delete > > >)+0xa81) [0x7f3f0b105791]
[bt] (8) /conda/envs/rapids/xgboost/libxgboost.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::ObjFunction)+0xd65) [0x7f3f0b106c95]
Aborted (core dumped)
The following are the specifications: Ubuntu 16.04.6 LTS Python 3.6.8 CUDA 10.2/ cuDNN -7.4.1 GPU model -Quadro GV100 Nvidia docker version : 18.09.6 RAM: 125 GB H2o4gpu is installed using PIP wheel for cuda 10.0 (https://s3.amazonaws.com/h2o-release/h2o4gpu/releases/stable/ai/h2o/h2o4gpu/0.3-cuda10/h2o4gpu-0.3.2-cp36-cp36m-linux_x86_64.whl)
Kindly provide your suggestions to this issue.