lintool / twitter-tools

Twitter Tools
twittertools.cc
218 stars 100 forks source link

Status.fromJson can fail, throwing an NPE #35

Closed isoboroff closed 11 years ago

isoboroff commented 11 years ago

I found some statuses in my crawl that don't have the retweet_count field. This causes a NPE: java.lang.NullPointerException at cc.twittertools.corpus.data.Status.fromJson(Status.java:162) at cc.twittertools.corpus.data.JsonStatusBlockReader.next(JsonStatusBlockReader.java:44) at cc.twittertools.corpus.data.JsonStatusCorpusReader.next(JsonStatusCorpusReader.java:48) at cc.twittertools.search.indexing.IndexStatuses.main(IndexStatuses.java:138)

Sure enough, Status.fromJson tries to set the retweetCount by blindly following obj.get()'s result without checking it. Since it's a NPE I can only get the stack trace by running in jdb.

I'm happy to submit a patch to check this case, but I want to know -- is there any reason to expect any of these fields to be in a given status? If not, is there a cleaner way to wrap these checks than using all these try-catch blocks?

milesefron commented 11 years ago

Please feel free to patch. I'm traveling and can get to it in a couple days but not right now. If I don't hear otherwise, I'll work on it in a day or two.

And feel free to modify if an approach other than those try/gets would work. None were obvious to me.

Sent from my iPhone

On May 20, 2013, at 1:02 PM, Ian Soboroff notifications@github.com wrote:

I found some statuses in my crawl that don't have the retweet_count field. This causes a NPE: java.lang.NullPointerException at cc.twittertools.corpus.data.Status.fromJson(Status.java:162) at cc.twittertools.corpus.data.JsonStatusBlockReader.next(JsonStatusBlockReader.java:44) at cc.twittertools.corpus.data.JsonStatusCorpusReader.next(JsonStatusCorpusReader.java:48) at cc.twittertools.search.indexing.IndexStatuses.main(IndexStatuses.java:138)

Sure enough, Status.fromJson tries to set the retweetCount by blindly following obj.get()'s result without checking it. Since it's a NPE I can only get the stack trace by running in jdb.

I'm happy to submit a patch to check this case, but I want to know -- is there any reason to expect any of these fields to be in a given status? If not, is there a cleaner way to wrap these checks than using all these try-catch blocks?

— Reply to this email directly or view it on GitHub.

isoboroff commented 11 years ago

I'll put up a basic patch in a pull request, and leave the redesign for some comments from Jimmy. The trivial answer is methods on a JsonObject like get_asLong("field") which can catch the null more cleanly, but Jimmy may have a better answer from the style perspective.

isoboroff commented 11 years ago

https://github.com/isoboroff/twitter-corpus-tools/commit/9c1432c6f96cab43d086310c2c6d4711d11c0074

isoboroff commented 11 years ago

This patch worked for me, completing a full index in about a day and a half on my Mac Pro running in all local storage. I'm going to mark this closed, and merge the patch.