kjw0612 / awesome-random-forest

Random Forest - a curated list of resources regarding random forest
http://jiwonkim.org/awesome-random-forest
1.18k stars 334 forks source link

Maximum features used for Random forest #11

Open dhwani2410 opened 4 years ago

dhwani2410 commented 4 years ago

I have a matrix of 4000*1400, can I use it for classification problems using random forest?

xuyxu commented 4 years ago

Given the size of the matrix, I suppose that the number of samples is 4000, while the number of features is 1400. Such size is still small, and can be easily solved using implementations of RF, such as sklearn.randomforestclassifier.

dhwani2410 commented 4 years ago

@AaronX121 thanks a lot for your reply, also can you suggest how to overcome the class imbalance in such cases.

xuyxu commented 4 years ago

sklearn.randomforestclassifier can naturally handle class imbalance problems by passing the argument class_weight (e.g., put large weights on classes with very few samples). If this approach does not meet your requirements, I suggest to address the problem from the outside through over-sampling / down-sampling on the original dataset. This paper can be helpful: ''Exploratory undersampling for class-imbalance learning''.