During our work to upgrade a project to a newer version of Hadoop (3.2), we discovered that the spark-solr connector imports a couple of HttpClient classes (NoHttpResponseException and ConnectTimeoutException) from the HttpClient 3.1 package (included by hadoop 2.7) instead of the version 4.x equivalent classes.
This was discovered during a Spark job test run where we index our data into Solr -- one of the hosts of our SolrCloud went down, and a NoHttpResponseException was supposed to be thrown. However, the class couldn't be found, because we had excluded all the hadoop dependencies at runtime, which meant HttpClient 3.1 was not added as a dependency.
Referencing HttpClient 3.1 classes would prevent the spark-solr connector from working with Hadoop version 2.8 or later, but changing the references now shouldn't affect its integration with hadoop 2.7.
During our work to upgrade a project to a newer version of Hadoop (3.2), we discovered that the spark-solr connector imports a couple of HttpClient classes (
NoHttpResponseException
andConnectTimeoutException
) from the HttpClient 3.1 package (included by hadoop 2.7) instead of the version 4.x equivalent classes.This was discovered during a Spark job test run where we index our data into Solr -- one of the hosts of our SolrCloud went down, and a
NoHttpResponseException
was supposed to be thrown. However, the class couldn't be found, because we had excluded all the hadoop dependencies at runtime, which meant HttpClient 3.1 was not added as a dependency.Referencing HttpClient 3.1 classes would prevent the spark-solr connector from working with Hadoop version 2.8 or later, but changing the references now shouldn't affect its integration with hadoop 2.7.