Open dhwani2410 opened 4 years ago
Given the size of the matrix, I suppose that the number of samples is 4000, while the number of features is 1400. Such size is still small, and can be easily solved using implementations of RF, such as sklearn.randomforestclassifier.
@AaronX121 thanks a lot for your reply, also can you suggest how to overcome the class imbalance in such cases.
sklearn.randomforestclassifier can naturally handle class imbalance problems by passing the argument class_weight
(e.g., put large weights on classes with very few samples). If this approach does not meet your requirements, I suggest to address the problem from the outside through over-sampling / down-sampling on the original dataset. This paper can be helpful: ''Exploratory undersampling for class-imbalance learning''.
I have a matrix of 4000*1400, can I use it for classification problems using random forest?