mattweber / solr-for-wordpress

A WordPress plugin that replaces the default WordPress search with Solr
85 stars 36 forks source link

Support for separated update/query Solr's #12

Open epugh opened 13 years ago

epugh commented 13 years ago

Matt,

I've been doing some hacking around analytics for Solr, and using a proxy to pick up the analytic data I need. My test bed has been our own website. I just hacked up the plugin to support separated query/update Solrs (or at least the end points!!).

I have a screenshot of the admin panel change: https://skitch.com/epugh/f2pt8/solr-options-open-source-search-engine-implementation-solr-lucene-search-integration-wordpress

There is one slight CSS issue in the formatting of the radio button options, but otherwise it works and starts with sane defaults. This is my first attempt at working with Wordpress, so any feedback appreciated. Thought others might have a similar need.

Eric Pugh

shaksi commented 13 years ago

Hey Eric,

Just had a look at your request, looks good, having looked at your changes the first thing that came to mind and something i have tried to tackle previously is multicore search (or different server searches).

We need to figure out a flexible way of using single server connection function to possibly decide which server it should be reading/writing to etc.

Although I havent worked it all out the way i was envisioning it was possibly making the $port/$host/$path an associative array. (similar UI as yours)

This would allow us to set a paramater in the url where by we can decide which core/server to retrieve the search from. Naturally we would only be able to write/send our index to one server but but search as many as we need.

How does something like that sound?

epugh commented 13 years ago

Shakur,

Good to hear from you. I was trying to divine the best way to hook my code in, and then realized that it seemed to be set up for multiple different "connections". At any rate, the model I was working from was the Ruby Sunspot library for Solr that allows you to separate out the writer from the readers, that was kind of where I came from.

Honestly though, I am not sure that I quite grok what you mean by multicore search. Are you suggesting issuing a query across solr1 and solr2? But not via the distributed search pattern? Or, do you mean that you might have an architecture with 1 master and 2 slaves... I think that load balancing between the various slaves is more the domain of the Solr backend, versus the frontend Wordpress plugin?

Any way that you can incorporate solving the itch I am trying to scratch, of separating out the writer from the reader totally works for me, I am not emotionally tied up in my little bit of hackery! And would love to go back to using the master branch!

Eric On Sep 12, 2011, at 6:16 PM, Shakur Shidane wrote:

Hey Eric,

Just had a look at your request, looks good, having looked at your changes the first thing that came to mind and something i have tried to tackle previously is multicore search (or different server searches).

We need to figure out a flexible way of using single server connection function to possibly decide which server it should be reading/writing to etc.

Although I havent worked it all out the way i was envisioning it was possibly making the $port/$host/$path an associative array. (similar UI as yours)

This would allow us to set a paramater in the url where by we can decide which core/server to retrieve the search from. Naturally we would only be able to write/send our index to one server but but search as many as we need.

How does something like that sound?

Reply to this email directly or view it on GitHub: https://github.com/mattweber/solr-for-wordpress/pull/12#issuecomment-2076097


Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

mattweber commented 13 years ago

Eric,

This is great, especially if you want to index to a master and query off of a slave(s). If Shakur doesn't get a chance to merge this in, I will try and do it this weekend. Thanks for the patch!

Shakur,

Can you explain what you mean? To handle high qps one would use this patch to index to a master, set that master instance up to replicate to some slaves, and setup load balancing between the slaves using haproxy, nginx, etc. I don't think we want the php client doing the load balancing even though I think it supports it.

Thanks, Matt Weber

shaksi commented 13 years ago

okay branching out to 'multi_core_and_servers'. Just prepped things (merged erics work and trying to work on top of it), its bit late atm but will give you guys explanation of what I was talking about in few hours.

epugh commented 13 years ago

Cool. And remember, from Solr's perspective, multiple cores look like multiple servers or not! They are all just URL's!

shaksi commented 13 years ago

Yes, you are right :D

The above mentioned branch works with master search but can be extended to take any other server/core for that matter. Just need to some how surface UI for that.

On 15 Sep 2011, at 14:43, Eric Pugh wrote:

Cool. And remember, from Solr's perspective, multiple cores look like multiple servers or not! They are all just URL's!

Reply to this email directly or view it on GitHub: https://github.com/mattweber/solr-for-wordpress/pull/12#issuecomment-2104461

shaksi commented 13 years ago

Okay lads, I have this in a stable state.

I have made it so that s4w_get_solr now accepts server ID based keys given during the initial setup. We always have a 'master' defined as default and any number of slave servers can be defined.

We now have it so that from the plugin page one may decide which of the defined servers the plugin will use as search/update. but that is not say thats the only options that exists, we can now include the server parameter as part of the search and if the id provided is valid that instance will used for search.

https://skitch.com/shaksi/f3mdh/solr-options-solr.test-wordpress <<< admin page

https://skitch.com/shaksi/f3mf8/another-search-results-solr.test << setting server parameter

https://skitch.com/shaksi/f3mfi/without-server-parameter << without server parameter uses the options set in teh admin page

As per your instruction a given user can go from single server setup to multi sever without any problem vice versa.

https://skitch.com/shaksi/f3mgj/single-server-instance https://skitch.com/shaksi/f3mg4/single-server-search

Hope that all makes sense, I have merged and pushed to master please have a play around. Feedback welcome.

mattweber commented 13 years ago

Is the code smart enough to allow only 1 update server selection or send documents to each update server when more than 1 is selected?

shaksi commented 13 years ago

As things stand there is one canonical update server, this can be set to any given server. Similarly for search one instance is chosen as the default search but not limited to it, I have allowed leeway for occasions where someone might want to search different server than the default (one default indexes and returns wordpress and the other some external data.)

IMO I dont think its the plugins place to be medling with sending data to more than one server as that is a setup issue as you previously mentioned above one index on to the master and allow it to replicate to all the other instances as required.

epugh commented 13 years ago

I like. Is there a limit to how many slaves you can have? I assume it's dynamic, not fixed to 1 master and two slaves?

Again, I guess it's good that the plugin does the roundrobin-ing, however I would think that roundrobin-ing via the plugin would be a less common way, versus having a real load balancer in front of the slaves.

Two nitpicks would be: 1) Do we need some messaging to tell users the difference between and update and query solr? Or do we assume folks understand it if they are using the plugin.

2) messaging under the "Single Solr Server" should probably not refer to Solr 1.4, since the version changes. And really, that line of "Download, install, and configure your own Solr 1.4 instance" doesn't really make sense I don't think...

mattweber commented 13 years ago

Yea I like this too. Not a fan of the option button between selecting index/search hosts. I would think the first host defined is the master, and any others after that are search slaves. I really don't think anyone will use more than 1 search server because a real load balancer will be considerably better.

Eric is right, we need to clean up the wording and make sure we define the differences between search and index servers. Users will get confused otherwise. Then again, maybe we just name that tab "Advanced" setup and assume the user knows what they are doing if they go in there.

Thanks, Matt Weber

shaksi commented 13 years ago

Arh its late and I was running out off the office! Having reread your comment.

The answer is yes, it is smart enough and does only send the documents to the one designated server.

Where to send documents is decided option: [s4w_server][type][update]

Similarly where to search by default is decided by the following option: [s4w_server][type][search]

The above mentioned options contain ServerIDs generated by the plugin.

------Original Message------ From: Matt Weber To: Shakur Shidane Subject: Re: [solr-for-wordpress] Support for separated update/query Solr's (#12) Sent: Sep 15, 2011 21:54

Is the code smart enough to allow only 1 update server selection or send documents to each update server when more than 1 is selected?

Reply to this email directly or view it on GitHub: https://github.com/mattweber/solr-for-wordpress/pull/12#issuecomment-2109103


mattweber commented 13 years ago

So if more than 1 search server is defined we one search across the selected one? If yes, I think we need to use the load balancing built into the solr php api and round-robin between them.

Thanks, Matt Weber

epugh commented 13 years ago

Sounds great!

Eric

On Sep 15, 2011, at 5:25 PM, Matt Weber wrote:

So if more than 1 search server is defined we one search across the selected one? If yes, I think we need to use the load balancing built into the solr php api and round-robin between them.

Thanks, Matt Weber

Reply to this email directly or view it on GitHub: https://github.com/mattweber/solr-for-wordpress/pull/12#issuecomment-2109401


Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

shaksi commented 13 years ago

I agree the UI needs working! I don't know what the right way to go about is.

The way I see it atm, we haven't actually been clear as to what we mean when we say master/slave!

Is that even the appropriate adjectives to use?

I like the idea of renaming the second tab as advanced.

@Eric With regards to the number of servers there is no limit, although round robin could in theory be done with this plugin, it itself does not implement it (on the fly anyways)

What does is give a user the ability to define more than one server, modify the search form to provide dropdown of the servers which could searched.

Example: http://bfc.staging.headshift.com (it uses radio buttons to allow users to select between solr instances)

PS I am making the assumption that the listed server instances might not contain the same data. Is that a fair assumption? Or beyond the realm of a plugin a for a specific cms.

------Original Message------ From: Matt Weber To: Shakur Shidane Subject: Re: [solr-for-wordpress] Support for separated update/query Solr's (#12) Sent: Sep 15, 2011 22:17

Yea I like this too. Not a fan of the option button between selecting index/search hosts. I would think the first host defined is the master, and any others after that are search slaves. I really don't think anyone will use more than 1 search server because a real load balancer will be considerably better.

Eric is right, we need to clean up the working and make sure we define the differences between search and index servers. Users will get confused otherwise. Then again, maybe we just name that tab "Advanced" setup and assume the user knows what they are doing if they go in there.

Thanks, Matt Weber

Reply to this email directly or view it on GitHub: https://github.com/mattweber/solr-for-wordpress/pull/12#issuecomment-2109325


mattweber commented 13 years ago

Definitely beyond the realm of this plugin. We should only be searching across servers that contain data indexed using the plugin. I think a simple 2 server setup like the original patch is probably the way to go. For the majority of users that is all they will ever need and it reduces the complexity of using the plugin considerably.

epugh commented 13 years ago

+1 Aim for the simplest use case that solves the itch!

On Sep 15, 2011, at 7:20 PM, Matt Weber wrote:

Definitely beyond the realm of this plugin. We should only be searching across servers that contain data indexed using the plugin. I think a simple 2 server setup like the original patch is probably the way to go. For the majority of users that is all they will ever need and it reduces the complexity of using the plugin considerably.

Reply to this email directly or view it on GitHub: https://github.com/mattweber/solr-for-wordpress/pull/12#issuecomment-2110313


Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

dustinrue commented 13 years ago

I'd say that if a user has more than one read server they should be using some outside method for load balancing. Been using an old version of the plugin and it's great to see that I can now use two servers. In our setup we have a master solr box with a couple of load balanced read (search) servers. Solr for Wordpress's current setup is perfect.

epugh commented 13 years ago

Glad to hear it!

shaksi commented 13 years ago

awesomeness, Sorry I havent got back to this discussion... @epugh & @mattweber its settled then for advanced users we will only offer the simplest use case. master and slave. The former to take care of the crud business and the latter just for reading.