open-connectome-classes / StatConn-Spring-2015-Info

introductory material
18 stars 4 forks source link

classifier #228

Open kristinmg opened 9 years ago

kristinmg commented 9 years ago

What could be a good classifier to implement (in matlab) if I'm using some calculated network measures as an input? I was trying to implement a likelihood ratio classifier, but I don't have very much data so I often get a zero (or low, but equal) probability of being in each group.

yaxigeigei commented 9 years ago

Are you classifying graphs instead of vertices? If so, what are the calculated network measures?

kristinmg commented 9 years ago

I'm classifying graphs yes, but using measures such as degree or eigenvector centrality. I found the node giving the most significant difference between the true groups and am using the degree of just that node as an input (so input to the classifier in that case is one value per graph/individual).

mrjiaruiwang commented 9 years ago

It is fashionable in today's time to first consider SVMs (both linear and nonlinear-gaussian or quadratic) and deep learning. Those two seem to have a cult following of researchers who swears by their efficacy. However, a mathematically significant classifier is the 1-nearest neighbor classifier, which has a misclassification rate bounded by e* and 2e, where e is the misclassification rate of the Bayes classifier, the best theoretic classifier possible (although never attainable in practice). There are a few biologically relevant classifiers like k-TSP (invented by Dr. Don Geman here at Hopkins).

wrgr commented 9 years ago

Use Random Forest! See, for example: "An empirical evaluation of supervised learning in high dimensions" Caruna 2008.

"In this paper we perform an empirical evaluation of supervised learning on high-dimensional data. We evaluate performance on three metrics: accuracy, AUC, and squared loss and study the effect of increasing dimensionality on the performance of the learning algorithms. Our findings are consistent with previous studies for problems of relatively low dimension, but suggest that as dimensionality increases the relative performance of the learning algorithms changes. To our surprise, the method that performs consistently well across all dimensions is random forests, followed by neural nets, boosted trees, and SVMs."

Plus easy and super simple to use for a novice. There's a great google code version. Post a question if you need me to link you.

mblohr commented 9 years ago

Naive Bayes is also good, if the independence assumption can be met.