lucidworks / zeppelin-solr

Apache Solr interpreter for Apache Zeppelin
Apache License 2.0
28 stars 4 forks source link

Connecting to a Solr Cloud running on AWS means URL's don't work #12

Closed epugh closed 5 years ago

epugh commented 5 years ago

This maybe is really a Solr problem, not this plugin to Zeppelin, but I found it here. Since this plugin finds the Solr servers via Zookeeper, the Solr URL's in Zookeeper need to be accessible from where Zeppelin is running.

On AWS, when folks deploy Solr, the baseUrl is going to not be public accessible, like this one:

"base_url":"http://10.0.9.109:8983/solr",

Look in state.json!

What I've been doing is just downloading the state.json, and changing the base_url to be the public IP address, and then just reuploading that back into Zookeeper!

kiranchitturi commented 5 years ago

That is annoying. Could the hostname be changed for those Solr instances to point to external? or I guess zeppelin needs to run in the same AWS group to access those private addresses

epugh commented 5 years ago

I think the way AWS sets it up by default, is using the IP addresses... So, yeah, my hacking of state.json works, until at some point Zookeeper (or somebody) updates the state.json back to the private addresses. It would be great to pass in some sort of mapping of internal to external addresses for these cases. And yes, I understand we would have to maintain that outside...

One thought? Could I pass in the external URL via https://lucene.apache.org/solr/7_4_0//solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.Builder.html#Builder-java.util.List and have it work? Could that be teh way around this? Augment the Zookeeper client to allow me to pass in a Solr URL instead of ZK connection string????

epugh commented 5 years ago

Okay, I've made some poking around progress. So my first hope that using the builder with the external names would make it all work didn't:

val solrClientBuilder = new CloudSolrClient.Builder().withSolrUrl("https://myserver-us-east-1-aws.amazon.com/solr")

Yes, I can connect to the cluster, and use the HttpClusterStateProvider instead of the ZkClientClusterStateProvider, which by the way might be a better approach in general to connect with...

However, when it returns the clusterstate.json, it still has the internal IP addresses. See the below screen shot:

screenshot at dec 24 09-07-21

What do you think aobut using the DelegatingClusterStateProvider?

https://lucene.apache.org/solr/7_5_0/solr-solrj/org/apache/solr/client/solrj/cloud/autoscaling/DelegatingClusterStateProvider.html

I could create that, and have it map baseUrls back to external addresses?

kiranchitturi commented 5 years ago

Sorry for the delay, back from holidays.

Looking back at the code, there are so many methods hard coded to zkhost, I don't know if it will work with just solr base url.

Isn't it easy to just set the hostname to external ip when starting Solr?

epugh commented 5 years ago

There are any number of issues, and honestly, it should probably be at the SolrJ client level of an improvement, versus in this plugin for Zeppelin… :-(

There isn’t really any concept of an “internal map of IP addresses” versus “external map of IP addresses” in Solr, which I think is because we use the same mechanism, ZK, for both inside the cluster communication, but also for external routing of queries…

I’m working with a hosted SolrCloud provider, so I don’t have the ability to change the address.

My hack for changing the state.json does work…. So I could keep doing that.

Eric

On Jan 2, 2019, at 11:37 PM, Kiran Chitturi notifications@github.com wrote:

Sorry for the delay, back from holidays.

Looking back at the code, there are so many methods hard coded to zkhost, I don't know if it will work with just solr base url.

Isn't it easy to just set the hostname to external ip when starting Solr?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lucidworks/zeppelin-solr/issues/12#issuecomment-451054486, or mute the thread https://github.com/notifications/unsubscribe-auth/AABXe7eDr6ADEUua-K1sKP6zPVM1Vx8fks5u_YkMgaJpZM4Y70Di.


Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com http://www.opensourceconnections.com/ | My Free/Busy http://tinyurl.com/eric-cal
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.