lucidworks / spark-solr

Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.
Apache License 2.0
445 stars 250 forks source link

HttpClient 3.1 classes are imported, instead of HttpClient 4.x equivalents #272

Closed theoathinas closed 4 years ago

theoathinas commented 5 years ago

During our work to upgrade a project to a newer version of Hadoop (3.2), we discovered that the spark-solr connector imports a couple of HttpClient classes (NoHttpResponseException and ConnectTimeoutException) from the HttpClient 3.1 package (included by hadoop 2.7) instead of the version 4.x equivalent classes.

This was discovered during a Spark job test run where we index our data into Solr -- one of the hosts of our SolrCloud went down, and a NoHttpResponseException was supposed to be thrown. However, the class couldn't be found, because we had excluded all the hadoop dependencies at runtime, which meant HttpClient 3.1 was not added as a dependency.

Referencing HttpClient 3.1 classes would prevent the spark-solr connector from working with Hadoop version 2.8 or later, but changing the references now shouldn't affect its integration with hadoop 2.7.

theoathinas commented 5 years ago

got a PR for this here: https://github.com/lucidworks/spark-solr/pull/273

kiranchitturi commented 4 years ago

Thank you for your contribution. Merged the PR

theoathinas commented 4 years ago

thanks