I am using the following code to index into the SOLR from the dataproc
val df = spark.read.format("csv").option("header", "true").load("gs://a-google-repository/books.csv")
df.show(15)
val zkhost = "exposed-loadbalancer-ip-for-zookeeper:2181"
val collection = "my-collection"
val writeOpts = Map("zkhost" -> zkhost, "collection" -> collection, "batch_size" -> "10000", "commit_within" -> "30000")
df.write.format("solr").options(writeOpts).save
com.lucidworks.spark.util.SolrSupport.getCachedCloudClient(zkhost).commit(collection)
I am getting the error that it cannot find a live server solr-0 solr-1 etc. I suspect that the using loadbalancer for exposing the service might be the issue because of the statefulset, if that is the case is how can I get IP address and can access from outside to the headless zookeeper service? Thanks
Hi
I am using the following code to index into the SOLR from the dataproc
val df = spark.read.format("csv").option("header", "true").load("gs://a-google-repository/books.csv") df.show(15) val zkhost = "exposed-loadbalancer-ip-for-zookeeper:2181" val collection = "my-collection" val writeOpts = Map("zkhost" -> zkhost, "collection" -> collection, "batch_size" -> "10000", "commit_within" -> "30000") df.write.format("solr").options(writeOpts).save com.lucidworks.spark.util.SolrSupport.getCachedCloudClient(zkhost).commit(collection)
I am getting the error that it cannot find a live server solr-0 solr-1 etc. I suspect that the using loadbalancer for exposing the service might be the issue because of the statefulset, if that is the case is how can I get IP address and can access from outside to the headless zookeeper service? Thanks