hector-client / hector

a high level client for cassandra
http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/
MIT License
644 stars 299 forks source link

NPE in HConnectionManager while one host down #176

Closed tbax closed 13 years ago

tbax commented 13 years ago

I'have a cluster of two nodes and while one host is down I'm getting a NPE in the HConnectionManager class (with RoundRobinPolicy). I'm running my application with hector 0.7.0-28 and get the NPE at row 233: java.lang.RuntimeException at de.gad.nkr.sdc.logsearch.find.Search.execute(Search.java:117) at de.gad.nkr.sdc.infocenter.faces.flow.logsearch.Logsearch.search(Logsearch.java:916) at de.gad.nkr.sdc.infocenter.faces.flow.logsearch.Logsearch.loadDocs(Logsearch.java:762) at de.gad.nkr.sdc.infocenter.faces.flow.logsearch.Logsearch.run(Logsearch.java:898) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:233) at de.gad.nkr.sdc.logsearch.find.Search.execute(Search.java:112) ... 4 more

I debugged the code to the point where the NPE occured. Extract from the code: ... if ( he instanceof HInvalidRequestException || he instanceof HCassandraInternalException || he instanceof HUnavailableException) { // break out on HUnavailableException as well since we can no longer satisfy the CL throw he; } else if ( he instanceof HectorTransportException) { --retries; client.close(); ====>>>>>At this point client is null and numBlocked is incremented markHostAsDown(client); excludeHosts.add(pool.getCassandraHost()); retryable = true; if ( retries > 0 ) { monitor.incCounter(Counter.RECOVERABLE_TRANSPORT_EXCEPTIONS); }
} ..

zznate commented 13 years ago

Ouch. That is a pretty blatant miss. A fix has been committed to master and 0.7.0 branch. Thanks for bringing this up.