tdunning / knn

Large scale k-nn experiments
http://mahout.mapr.com
68 stars 21 forks source link

String.substring always returns null for corpus weighting #9

Open dfilimon opened 11 years ago

dfilimon commented 11 years ago

At line 173 in Vectorize20NewsGroups.java [1], the substring call is from startIndex 1 to endIndex 1 which always returns an empty string. So, the CorpusWeighting cw is always going to be null.

Did you run it to see if it works? :)

[1] https://github.com/tdunning/knn/commit/c09d742febf5242899b1c187c802d3bbb5164f0d#L0R173

tdunning commented 11 years ago

Ouch. Bit again by the index-or-length issue.

This is not a substring of the word... it is a substring of the code for controlling the word weighting. I have run the code and don't understand how it avoided an NPE here.

On Thu, Dec 27, 2012 at 6:57 AM, Dan Filimon notifications@github.comwrote:

At line 173 in Vectorize20NewsGroups.java [1], the substring call is from startIndex 1 to endIndex 1 which always returns an empty string. So, the CorpusWeighting cw is always going to be null.

Did you run it to see if it works? :)

[1] c09d742#L0R173https://github.com/tdunning/knn/commit/c09d742febf5242899b1c187c802d3bbb5164f0d#L0R173

— Reply to this email directly or view it on GitHubhttps://github.com/tdunning/knn/issues/9.

tdunning commented 11 years ago

Looks like I never ran this version:

Exception in thread "main" java.lang.NullPointerException at org.apache.mahout.knn.Vectorize20NewsGroups$CorpusWeighting.parse(Vectorize20NewsGroups.java:175) at org.apache.mahout.knn.Vectorize20NewsGroups.main(Vectorize20NewsGroups.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)

On Thu, Dec 27, 2012 at 12:30 PM, Ted Dunning ted.dunning@gmail.com wrote:

Ouch. Bit again by the index-or-length issue.

This is not a substring of the word... it is a substring of the code for controlling the word weighting. I have run the code and don't understand how it avoided an NPE here.

On Thu, Dec 27, 2012 at 6:57 AM, Dan Filimon notifications@github.comwrote:

At line 173 in Vectorize20NewsGroups.java [1], the substring call is from startIndex 1 to endIndex 1 which always returns an empty string. So, the CorpusWeighting cw is always going to be null.

Did you run it to see if it works? :)

[1] c09d742#L0R173https://github.com/tdunning/knn/commit/c09d742febf5242899b1c187c802d3bbb5164f0d#L0R173

— Reply to this email directly or view it on GitHubhttps://github.com/tdunning/knn/issues/9.

dfilimon commented 11 years ago

Yeah, no worries, I patched it up and ran it. Could you please look at the thread on the mailing list? :)