uhho / density-clustering

Density Based Clustering in JavaScript
MIT License
214 stars 33 forks source link

OPTICS: Core Distance Calculation #4

Closed uarga closed 10 years ago

uarga commented 10 years ago

Hey,

I believe that the core distance calculation is wrong.

Per definition, the core-distance of an object o is the smallest distance for o to be a core object.

Your function distanceToCore(pointId, neighbors):

minDistance = undefined;
if (neighbors.length >= this.minPts) {
    var minDistance = this.epsilon;
    neighbors.forEach(function(pointId2) {
        var dist = self.distance(self.dataset[pointId], self.dataset[pointId2]);
        if (dist < minDistance) minDistance = dist;
    });
}
return minDistance;

As far as I understand, you set the coreDistance to the distance to the nearest neighbor. That's wrong in genereal and would only be correct if there were a total of minPts many points as near as the nearest neighbor.

Here's a suggestion for distanceToCore:

OPTICS.prototype._distanceToCore = function(pointId) {

    var coreDistCand;
    for(coreDistCand = 0, l = this.epsilon; coreDistCand < l; coreDistCand++) {
      var neighbors = [];
      for (var id = 0, k = this.dataset.length; id < k; id++) {
        if (this.distance(this.dataset[pointId], this.dataset[id]) <= coreDistCand) {
          neighbors.push(id);
        }
      }
      if(neighbors.length >= this.minPts) {
        return coreDistCand;
      }
    }
    return -1;
}
uhho commented 10 years ago

@FredericMarkert Definitely agreed. Thanks a lot! I've changed just one thing; when looking for neighbours there is no need to include the core point itself. So I think we can use _regionQuery() function here.

Next time, please feel free to send a pull request!

uarga commented 10 years ago

I'm glad to be helpful.

About including the core point: I wasn't sure about that, as the original paper does not elaborate on whether it's a closed or open neighborhood...

uarga commented 10 years ago

Well, I'm confused. The following regards DBSCAN, but should be analogue to OPTICS:

In the DBSCAN wikipedia article, it is stated that core points are included:

regionQuery(P, eps): return all points within P's eps-neighborhood (including P)

In the article talk section, I asked whether this is a mistake:

The authors do not elaborate on whether it's a closed or open neighborhood. The last sentence of the article about neighborhoods [...] would suggest to exclude P unless it is explicitly stated otherwise.

Here's a reply:

In a database context, such as when using an R*-tree as recommended by the authors, the query would include the query point. It is part of the database, too

Can you clarify?

uhho commented 9 years ago

I've just released new version (1.2.0). From now core points are included in region queries, both in DBSCAN and OPTICS.