KNNClassifier improvements

numenta / nupic-legacy

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

http://numenta.org/

GNU Affero General Public License v3.0

6.34k stars 1.56k forks source link

KNNClassifier improvements #2271

Open ryanjmccall opened 9 years ago

ryanjmccall commented 9 years ago

Improvements to KNNClassifier

[ ] infer() should have options for sparse and dense inputs. Right now, for example, if distance metric is 'rawOverlap' infer must be called with a dense input even if classifier was trained on sparse input repr. If sparse input is used, the classification is non-deterministic
[ ] Add getPatternCount() accessor method, possibly other similar methods for a nice API
[x] Understand partitionId and implement, remove, or document appropriately
[ ] Understand and then remove or document rowID
[ ] Move leaveOneOutTest method from KNNClassifier module to an appropriate test module or delete
[ ] Remove unused arguments from infer() method
[ ] Move leaveOneOutTest to another module
[ ] SVD / PCA features need better docs
[ ] Reorder parameters in constructor putting dependent parameters after their dependencies
[ ] Modify KNN 'infer' so that client has option of outputting top n most frequent categories rather than just the most frequent

ryanjmccall commented 9 years ago

@mihail911 @BoltzmannBrain FYI, recreating the tasks here since old issue was closed

BoltzmannBrain commented 9 years ago

Thank you @rmccall84. When pushing PRs for these tasks, please link with "addresses #2271"; this way the issue won't close :smile:

breznak commented 9 years ago

Modify KNN 'infer' so that client has option of outputting top n most frequent categories rather than just the most frequent

It would be nice if the n predictions could be sorted by probability. We had an issue on this, it's not possible to sort a dict by its values, so this would need some hacking. Ie. print infs -> {'a': 0.8, 'd': 0.5, 'b': 0.1, 'c': 0.1}

BoltzmannBrain commented 9 years ago

@breznak in progress in #2284, specifically here :smile:

breznak commented 9 years ago

@BoltzmannBrain very nice! :+1: :+1: I'm just running some data and wondering why it looks much better :wink:

SaganBolliger commented 9 years ago

Another issue: removeCategory() seems to be broken. Raises AttributeError: 'KNNClassifier' object has no attribute '_categoryRecencyList'

SaganBolliger commented 9 years ago

There also seems to be an issue with infer having a side effect when using partitionId. In particular, this unit test fails:

    params = {"distanceMethod": "rawOverlap"}
    classifier = KNNClassifier(**params)

    dimensionality = 40
    a = np.array([1, 3, 7, 11, 13, 17, 19, 23, 29], dtype=np.int32)
    b = np.array([2, 4, 8, 12, 14, 18, 20, 28, 30], dtype=np.int32)

    denseA = np.zeros(dimensionality)
    denseA[a] = 1.0

    denseB = np.zeros(dimensionality)
    denseB[b] = 1.0

    classifier.learn(a, 0, isSparse=dimensionality, partitionId=0)

    # The assertion below only fails when this line is included.
    # Therefore, infer has a side effect.
    cat, _, _, _ = classifier.infer(denseA, partitionId=1)

    classifier.learn(b, 1, isSparse=dimensionality, partitionId=1)

    cat, _, _, _ = classifier.infer(denseA, partitionId=0)
    self.assertEquals(cat, 1)