Closed lwhite1 closed 8 years ago
Can you please share your code snippets and data with me privately? Thanks!
sure. whats the best way to do that?
On Wed, Aug 31, 2016 at 8:37 PM, Haifeng Li notifications@github.com wrote:
Can you please share your code snippets and data with me privately? Thanks!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/haifengl/smile/issues/114#issuecomment-243943368, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRXghqOiU2e1e6VwPAwFBhhFytZbcPMks5qlh5TgaJpZM4JyE4G .
If the data is not too big, please email me at haifeng.hli@gmail.com. Thanks!
Looks like the problem is caused by duplicated samples in the data. I am working on enhancing CoverTree.
We fix the bug. Your data should run without problems with CoverTree. BTW, KNN is not a good method for your data. Many sample pairs have same distances. Given a sample, you may get a lot of data points (> 9) has same small distances. Different nearest neighbor data structures may return different set of 9 samples. The prediction may seem random.
Thank you very much!
On Wed, Sep 21, 2016 at 9:00 AM, Haifeng Li notifications@github.com wrote:
We fix the bug. Your data should run without problems with CoverTree. BTW, KNN is not a good method for your data. Many sample pairs have same distances. Given a sample, you may get a lot of data points (> 9) has same small distances. Different nearest neighbor data structures may return different set of 9 samples. The prediction may seem random.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/haifengl/smile/issues/114#issuecomment-248604826, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRXgpEZaZo-jKcQzyKsGtTYvNKCn_Caks5qsSp0gaJpZM4JyE4G .
I'm still experiencing an indexOutOfbounds exception on predict with the latest version from maven (1.2.0). The code snippet on which it happens in my 1.2.0 version of smile differs from the repository, so I think the fix is not yet deployed in a new version to maven.
v1.2.0 was released before this fix. We will release a new version soon. Thanks!
v1.2.1 is just released with the fix. Thanks!
Running a Knn model, it throws an ArrayIndexOutOfBoundsException on approximately every other run, using the same data (although in this case, I'm randomly splitting the dataset between test and train, I have the same issue if I run predict using the training set.
On those runs where it does not throw an exception, it completes normally.
I'm mostly using defaults, with k = 5, and 14 predictor variables per instance. Sample data below the stack trace.
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 4 at smile.classification.KNN.predict(KNN.java:263) at smile.classification.KNN.predict(KNN.java:247) at com.github.lwhite1.tablesaw.api.ml.classification.Knn.predictFromModel(Knn.java:108) at com.github.lwhite1.tablesaw.api.ml.classification.AbstractClassifier.populateMatrix(AbstractClassifier.java:18) at com.github.lwhite1.tablesaw.api.ml.classification.Knn.predictMatrix(Knn.java:77)
This is the input predictor variables (first few lines) from a run that didn't fail:
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 89.0, 0.0, 0.0] [1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 90.0, 0.0, 0.0] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 67.0, 1.0, 0.0] [1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 63.0, 1.0, 0.0] [0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 79.0, 0.0, 0.0] [0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 72.0, 1.0, 0.0] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 88.0, 0.0, 0.0] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 83.0, 1.0, 0.0] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 75.0, 1.0, 0.0] [1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 90.0, 0.0, 0.0] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 63.0, 0.0, 0.0] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 49.0, 1.0, 0.0] [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 85.0, 0.0, 0.0] [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 63.0, 0.0, 0.0] [1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 86.0, 1.0, 0.0] [1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 78.0, 1.0, 1.0] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 84.0, 0.0, 0.0] [1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 86.0, 1.0, 0.0] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 46.0, 0.0, 0.0] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 87.0, 0.0, 0.0] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 76.0, 0.0, 0.0] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 58.0, 0.0, 1.0] [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 62.0, 0.0, 0.0] [0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 84.0, 0.0, 0.0] [1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 72.0, 0.0, 0.0]