KNN with Dense features and sparse distance metrics

skn123 commented 8 years ago

I am reopening this issue:

https://github.com/mikegashler/waffles/issues/7

I think one part of the problem still remains; I cannot use void GKNN::trainSparse(GSparseMatrix& feats, GMatrix& labs)

as my data is of this type:

void GKNN::trainInner(const GMatrix& feats, const GMatrix& labs)

but with a sparse distance metric.

skn123 commented 8 years ago

In fact, I think the problem is far from resolved. What actually needs to happen is that GCosineSimilarity (etc.,) should "also" be derived from GDistanceMetric (if we want to do it that way). Otherwise, GSparseDistanceMetric should be passed when doing "train" from KNN

Summary: a.) I have a GMatrix containing dense features b.) I want to use GKNN c.) I want to use GCosineDistance (which is a sparse distance metric)

mikegashler commented 8 years ago

It could also work the other way. That is, sparse metrics can be applied to dense vectors if the user is willing to take a hit in performance, and dense metrics can be applied to sparse vectors if the user is willing to take a hit in performance. So, it is not really clear to me which one should derive from the other.

Perhaps, the most flexible design would involve intelligent vectors and intelligent metrics that automatically detect when sparse methods can be applied, and "do the right thing" without bothering the user. But that would involve a big cost in overhead. It's a lot easier to just provide redundant metrics. (That is, a sparse version and a dense version for each distance metric. There are only about 3 of them, anyway.) In general, I tend to assume that people usually don't even resort to using sparse matrices and corresponding sparse metrics unless it makes a big difference in performance. So, I am having some trouble understanding why it is even important to support the inefficient cases of mixing sparse/dense vectors/metrics. Can you provide any example use cases?

skn123 commented 8 years ago

Mike, I am not denying the use cases. I will give you an example. I am using Hyperspectral data. A well known metric for spectral similarity is SAM (spectral angle mapper). In your terminology, it is a cosine similarity. Now, the spectra that we have are all dense. Hence, with the current API, I cannot use this metric!

However, there is a need to change the entire KNN class and make it "opaque" to any kind of distance metric. I have hacked my way through the code and make it work. If you check the code as it stands, the only distance metric that one can use is GRowDistanceScaled and nothing else! I am proposing changing one thing in the code:

Every distance should be derived from GDistanceMetric and not GRowDistanceScaled; If that is the case then we can add any dissimilarity metric and we would need to define the virtual class "squared_distance" for each metric.

In fact, we should be able to use any distance metric (and there are plenty of them) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.154.8446

Had we had a specialized Linear Algebra routine within Waffles this would have become a simple 1 line code.

mikegashler commented 8 years ago

I see that you are right. I enabled GKNN to support arbitrary distance metrics. As a temporary feature to unblock you, I also added a class called GDenseCosineDistance.

There are still two small issues that I need to sort out before I can make GSparseSimilarity derive from GDistanceMetric: (1) what is the right way to deal with attribute scaling in sparse cases?, and (2) how should GKNN perform distance-weighted interpolation when the metric only computes similarity?

skn123 commented 8 years ago

IMO, I think GSparseSimilarity becomes redundant if all distances derive from GDistanceMetric; I have no clue about Point#2 :) Can you point me to some technical details about that? In fact GKNN should derive from Kernel Trick! Afterall, any kernel is fair game (with a few base parameters for some specialized kernels). If done that way, then, it should be easy to generalize this for Non-Linear Semi-supervised learning also (e.g., via Laplacian eigenmaps).

As I said, I have hacked my way through the code and made all distances derive from Distance Metric, except GLNormDistance.

On a second note; I have used all the supervised learners; Almost all of them gave consistent results barring one; Gaussian Process. I am getting a classification accuracy of 0% !!!!! Any idea why that may?

mikegashler commented 8 years ago

I think the sparse metrics still serve a useful purpose for improving performance in cases where highly sparse vectors are used. For example one of my instance-based collaborative filters uses them.

I like your idea of incorporating the kernel trick into KNN, so I made a new distance metric called GKernelDistance that lets the user supply an arbitrary kernel to serve as the metric.

Regarding Gaussian Process, if your problem involves binary classification, then 0% is fantastic!--just invert the prediction =). The problem seems to be specific for your data, because I get non-zero accuracy across several UCI datasets, and with several different kernels. I would guess that a divide-by-zero might be occurring somewhere, but I would not be able to debug it without repro steps.

skn123 commented 8 years ago

My classification still involves Dense features and not a binary classification. The API looks good now; Why leave out Pearson and Euclidean Distance (actually, it should be Inverse Euclidean Distance which in Computer vision parlance if also known as the Geman-McClure distance)? Adding those two also should complete the function.

Now, do you think that GDistance should be bifurcated into GKernelDistance and GSparseDistance? All the distances that you have today should be based out the "abstract" GKernelDistance and I see that you have an API called "apply"; That should become the standard API for all distances (instead of the current squareddistance). Then, GKNN can be templated to accept either GKernelDistance (as in Dense distances) or GSParseDIstances. Any thoughts?

skn123 commented 8 years ago

A couple of bugs:

// static
GDistanceMetric* GDistanceMetric::deserialize(GDomNode* pNode)
{
    const char* szClass = pNode->field("class")->asString();
    if(strcmp(szClass, "GRowDistance") == 0)
        return new GRowDistance(pNode);
    if(strcmp(szClass, "GLNormDistance") == 0)
        return new GLNormDistance(pNode);
  if (strcmp(szClass, "GDenseCosineDistance") == 0)
    return new GDenseCosineDistance(pNode);
    throw Ex("Unrecognized class: ", szClass);
    return NULL;
}

I had to add GDenseCosineDistance;

However, even in spite of that, the code was crashing if I am trying to load the model from a GDom node:

knn_crash

skn123 commented 8 years ago

I am reverting back the changes; The parallelization that I had achieved by copies of the model now crashes with this new code.

mikegashler commented 8 years ago

I fixed that bug and added a unit test to exercise it. Sorry for the inconvenience.

Regarding the interface, I agree that some improvements could be made. However, my current projects do not not depend on distance metrics, so I am not in a hurry. If you are interested in making a branch for the purpose of refining this interface, I'll help as needed.

skn123 commented 8 years ago

Mike, Indeed, I like this library and I feel that it can be greatly augmented with other external libraries (which is of course not the philosophy of waffles). I will keep you posted as and when I do it. Now, regarding the Gaussian Process algorithm, I have some questions: a.) You state in the code that the GP algorithm is taken from Chapter 2 of Rasmussen's textbook. However, can you indicate where I can find more information in the textbook about the prediction routine? Or is it similar to latent variable modeling?

b.) I think I have traced the error in my classification the computation of the pseudo-inverse on line

m_pLInv = pL->pseudoInverse();

I am getting "Garbage values after this step". I am using the default values whereby I end up with a 350 x 350 matrix. HOwever, you are in effect computing all 350 singular values and I notice that most of the diagonal values are garbage.

mikegashler commented 8 years ago

a.) It looks like the first thing I do in GGaussianProcess::predict is use a Kernel to compute the "distance" between the input feature vector "in", and each of the input feature vectors from the training set "m_pStoredFeatures->row(i)". I store these computed "distances" in a temporary vector named "k". Next, I multiply "k" by the matrix "Alpha" to get the predicted output. Unfortunately, it has been so long since I wrote that code, that I have no idea which equation in that textbook led me to write that code. b.) My implementation of GMatrix::pseudoInverse build on GMatrix::singularValueDecompositionHelper, which I did not originally write. It has been stable enough in the past that I have never bothered to rewrite it, but I cannot vouch for its reliability.

mikegashler / waffles

KNN with Dense features and sparse distance metrics #8