h2oai / h2o4gpu

H2Oai GPU Edition
Apache License 2.0
456 stars 96 forks source link

R random_forest_classifier problem with high dimensions #811

Closed bkavlak closed 4 years ago

bkavlak commented 4 years ago

Hi,

I am using h2o4gpu package for random forest classification in R. Although I had no problem data with up to 100 dimensions, there are some problems appear when I use data above 150 dimensions. I can not define the problem exactly because the algorithm behaves differently: i.e. training the data in shorter minutes, giving 100% accuracy, and sometimes stopping at the prediction stage.

Shouldn't I use the algorithm for high-dimensional data yet?

sh1ng commented 4 years ago

Can you provide a reproducible example?

bkavlak commented 4 years ago

Sorry, I couldn't find a way to generate reproducible example for high dimensional data. It turned out that the accuracy is a problem with the data; however, I am still confused with that the algorithm sometimes trains in shorter time with more dimensions.

I cannot give more insights since I am not really an expert in the subject.