numenta / nupic-legacy

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
http://numenta.org/
GNU Affero General Public License v3.0
6.34k stars 1.56k forks source link

KNNClassifier improvements #2271

Open ryanjmccall opened 9 years ago

ryanjmccall commented 9 years ago

Improvements to KNNClassifier

ryanjmccall commented 9 years ago

@mihail911 @BoltzmannBrain FYI, recreating the tasks here since old issue was closed

BoltzmannBrain commented 9 years ago

Thank you @rmccall84. When pushing PRs for these tasks, please link with "addresses #2271"; this way the issue won't close :smile:

breznak commented 9 years ago

Modify KNN 'infer' so that client has option of outputting top n most frequent categories rather than just the most frequent

It would be nice if the n predictions could be sorted by probability. We had an issue on this, it's not possible to sort a dict by its values, so this would need some hacking. Ie. print infs -> {'a': 0.8, 'd': 0.5, 'b': 0.1, 'c': 0.1}

BoltzmannBrain commented 9 years ago

@breznak in progress in #2284, specifically here :smile:

breznak commented 9 years ago

@BoltzmannBrain very nice! :+1: :+1: I'm just running some data and wondering why it looks much better :wink:

SaganBolliger commented 9 years ago

Another issue: removeCategory() seems to be broken. Raises AttributeError: 'KNNClassifier' object has no attribute '_categoryRecencyList'

SaganBolliger commented 9 years ago

There also seems to be an issue with infer having a side effect when using partitionId. In particular, this unit test fails:

    params = {"distanceMethod": "rawOverlap"}
    classifier = KNNClassifier(**params)

    dimensionality = 40
    a = np.array([1, 3, 7, 11, 13, 17, 19, 23, 29], dtype=np.int32)
    b = np.array([2, 4, 8, 12, 14, 18, 20, 28, 30], dtype=np.int32)

    denseA = np.zeros(dimensionality)
    denseA[a] = 1.0

    denseB = np.zeros(dimensionality)
    denseB[b] = 1.0

    classifier.learn(a, 0, isSparse=dimensionality, partitionId=0)

    # The assertion below only fails when this line is included.
    # Therefore, infer has a side effect.
    cat, _, _, _ = classifier.infer(denseA, partitionId=1)

    classifier.learn(b, 1, isSparse=dimensionality, partitionId=1)

    cat, _, _, _ = classifier.infer(denseA, partitionId=0)
    self.assertEquals(cat, 1)