lucidworks / zeppelin-solr

Apache Solr interpreter for Apache Zeppelin
Apache License 2.0
28 stars 4 forks source link

Use HttpSolrClient to talk to Solr instead of CloudSolrClient #16

Closed joel-bernstein closed 5 years ago

joel-bernstein commented 5 years ago

This PR changes the internal client to HttpSolrClient instead of CloudSolrClient. There are a number of reasons for this change:

1) CloudSolrClient requires direct access to Zookeeper which may not be a good idea for security reasons because Zookeeper could be managing other applications besides Solr.

2) CloudSolrClient requires a stateful persistent connection with ZooKeeper which can make it brittle if the network connection is not very reliable.

3) Streaming Expressions will eventually have full support in non-SolrCloud mode.

4) Load balancing is quite easy to accomplish with nginx servers so using CloudSolrClient as a load balancer is not really needed from inside Zeppelin itself.

epugh commented 5 years ago

Really excited to see this patch... I can't use Zeppelin Solr with SearchStax's hosted Solr due to ZK having internal to AWS IP addresses... Not externally accessible addresses ;-(

As an interesting hack, I figured out I could grab the ZK hosted config file, change the internal addresses to external addresses, push it back, and then the CloudSolrClient would connect! Seems to last about six hours and then gets reset ;-)

This speaks to the challenge of having the "internal" network layout bleed out to the client.... And the frustration of using Zookeeper directly instead of HTTP with Solr....

joel-bernstein commented 5 years ago

Yeah this is an exciting fix. I think this just about ready to merge and release.