pijush1285 / sigFeature

sigFeature: sigFeature is an R package which is able to find out the significant features using support vector machine recursive feature elimination method (SVM-RFE) (Guyon, I., et al. 2002) and t-statistic. Feature selection is an important part dealing with machine learning technology. SVM-RFE is recognized as one of the most effective filtering methods, which is based on a greedy algorithm that only finds the best possible combination for classification without considering the differentially significant features between the classes. To overcome this limitation of SVM-RFE, the proposed approach is tuned to find differentially significant features along with notable classification accuracy. This package is able to enumerate the feature selection of any two-dimensional (for binary classification) data such as a micro array etc. This vignette explains the use of the package in a publicly available micro array data set.
GNU General Public License v2.0
4 stars 1 forks source link

how to determine the final best number of feathers ? #1

Open amjiuzi opened 5 years ago

amjiuzi commented 5 years ago

how to determine the final best number of feathers ? i got that svm-rfe is a nice way to solve feathers selection,
but i didn't realize how to determine how many feathers is the best, can u show me some advice? @pijush1285

pijush1285 commented 5 years ago

Hello Amjiuzi,

I have not found any such method which able to find the final best number of features.

When I was studying the paper named "Gene Selection for Cancer Classification using Support Vector Machines" by Isabelle Guyon (2002), I found that they also mentioned the same problem and they were trying to solve the issue. This is a challenging question to find out the best number of features among the feature set. You can ask the question to them.

It is true that SVM-RFE is a nice algorithm for finding the feature which able to classify with the highest number of classification accuracy. But here you only able to get the rank of those features. On the basis of this rank, you can select top N features for your classification.

If you wish to go further deeper then you need to do some iteration process. This method will not provide you with an exact number of features but you will get confidence in your selected features. The method is given below.

Method: Split your data set into k folds (here you should have to know external cross-validation and stratified cross-validation). Removing one fold from the total dataset use svm-rfe to select the features (rank). Continue the feature selection process each time removing another fold of samples. Finally, you will get k th number of features lists. Now select the feature from k th features lists on the basis of its frequency. Next, rank the feature on the basis of its frequency.

Best regards Pijush Das

On Fri, Oct 19, 2018 at 3:15 PM amjiuzi notifications@github.com wrote:

how to determine the final best number of feathers ? i got that svm-rfe is a nice way to solve feathers selection, but i didn't realize how to determine how many feathers is the best, can u show me some advice? @pijush1285 https://github.com/pijush1285

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pijush1285/sigFeature/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AT4w0E_vbBQUeJJAtv3I5hIPzBZpDxBdks5umZ9DgaJpZM4XwBY- .