Open ryanjmccall opened 9 years ago
@mihail911 @BoltzmannBrain FYI, recreating the tasks here since old issue was closed
Thank you @rmccall84. When pushing PRs for these tasks, please link with "addresses #2271"; this way the issue won't close :smile:
Modify KNN 'infer' so that client has option of outputting top n most frequent categories rather than just the most frequent
It would be nice if the n
predictions could be sorted by probability. We had an issue on this, it's not possible to sort a dict by its values, so this would need some hacking.
Ie. print infs
-> {'a': 0.8, 'd': 0.5, 'b': 0.1, 'c': 0.1}
@breznak in progress in #2284, specifically here :smile:
@BoltzmannBrain very nice! :+1: :+1: I'm just running some data and wondering why it looks much better :wink:
Another issue: removeCategory() seems to be broken. Raises AttributeError: 'KNNClassifier' object has no attribute '_categoryRecencyList'
There also seems to be an issue with infer having a side effect when using partitionId. In particular, this unit test fails:
params = {"distanceMethod": "rawOverlap"}
classifier = KNNClassifier(**params)
dimensionality = 40
a = np.array([1, 3, 7, 11, 13, 17, 19, 23, 29], dtype=np.int32)
b = np.array([2, 4, 8, 12, 14, 18, 20, 28, 30], dtype=np.int32)
denseA = np.zeros(dimensionality)
denseA[a] = 1.0
denseB = np.zeros(dimensionality)
denseB[b] = 1.0
classifier.learn(a, 0, isSparse=dimensionality, partitionId=0)
# The assertion below only fails when this line is included.
# Therefore, infer has a side effect.
cat, _, _, _ = classifier.infer(denseA, partitionId=1)
classifier.learn(b, 1, isSparse=dimensionality, partitionId=1)
cat, _, _, _ = classifier.infer(denseA, partitionId=0)
self.assertEquals(cat, 1)
Improvements to KNNClassifier
partitionId
and implement, remove, or document appropriatelyrowID
leaveOneOutTest
method from KNNClassifier module to an appropriate test module or deleteleaveOneOutTest
to another modulen
most frequent categories rather than just the most frequent