Closed GoogleCodeExporter closed 9 years ago
This has been discussed somewhat before, I believe on the google group. I'm not
against a patch like that, but I might not immediately integrate it. To do this
right, the Service and Response classes really need to be refactored so that
the way in which requests are sent to Solr is pluggable. There'd be either an
interface or abstract class that'd define the behavior, but the implementation
would be free to use whatever means deemed appropriate: file_get_contents,
curl, Zend_Http, etc.
Feel free to submit any patch, it only helps even if its not immediately
integrated - and if you take a shot at something like I laid out then that's
even more awesome :)
Original comment by donovan....@gmail.com
on 9 Sep 2010 at 2:15
Sounds workable; I'll try to get onto something like you suggested some time
soon.
Original comment by liam.obo...@gmail.com
on 16 Sep 2010 at 12:16
This suddenly sprang into urgency this weekend when I needed to rebuild a
search index with a few GB of documents in it; it appears that there is a
rather significant memory leak in the php stream context code, such that each
time you assign the contents to the http request that memory is not freed until
the process dies. I tried unsetting the options and the context, but it didn't
free the memory.
This is probably not a problem for an individual indexing request, but with a
daemon or a bulk indexing job it's a critical problem that makes the current
approach unusable. As such, I've rolled an emergency patch so that I could
rebuild our index, but it purely replaces the current approach with curl,
through changes in the Service and Response classes. If anyone else comes
across the need for this, I can add a patch here.
Hopefully I'll get on to a proper solution along the lines which Donovan has
suggested above later this week.
The following bugs refer to leaks with stream_context_create, but I suspect
it's a similar problem with the leak I have found.
* http://bugs.php.net/bug.php?id=50111
* http://bugs.php.net/bug.php?id=40257
Donovan, I was thinking that my implementation would be something like
* Create a factory class which the Service will use to create a Request object
(which submits requests to the service) and Response objects (which will
process the response).
* The factory can be injected into the Service; if none is set, it will
instantiate a default factory object which will create Requests and Responses
that do the current behaviour; this way the system will continue to behave as
it already does for backwards compatibility.
* I will also implement a factory which uses Curl instead, which will create
Requests and Responses that use Curl instead and handle the Curl format
response headers appropriately.
Original comment by liam.obo...@gmail.com
on 3 Oct 2010 at 4:19
What revision of the code are you using? In issue #20 / r21 I changed the code
to not create a new context for every request and instead to reuse 1 context
for get and 1 context for post. Are you using a single Apache_Solr_Service
instance or many throughout the indexing process?
Also, Timo Schmidt has given me a patch by email for switching out http
backends, which is why I created the http_requests branch to integrate it
without effecting trunk until I was satisfied with it. I'll get those changes
in asap so you don't have to duplicate effort there.
Original comment by donovan....@gmail.com
on 3 Oct 2010 at 5:05
I am using the one that only uses a single context, but the way that I am
calling it is inefficient and creates a new service for each request (a problem
I do need to address). I was trying to unset the option which has the document
content in it without success, but trying again today I am able to do so; the
problem was that I was losing an amount of memory equal to the document size
each request, not just the size of the stream context, so things went down hill
very quickly!
Anyhow, it is possible to unset the data if you set the option correctly, so
this is not so dire as I thought :) I'm still keen to roll the curl patch;
should I make it against the http_requests branch instead then?
Original comment by liam.obo...@gmail.com
on 3 Oct 2010 at 11:51
Hmm, running a diff between that branch and the trunk shows no differences,
e.g.
svn diff http://solr-php-client.googlecode.com/svn/trunk/
http://solr-php-client.googlecode.com/svn/branches/http_requests
returns nothing.
Original comment by liam.obo...@gmail.com
on 3 Oct 2010 at 11:43
I have not committed the differences yet. I will let you know here when I do.
Original comment by donovan....@gmail.com
on 4 Oct 2010 at 2:50
Here it is: r40
Original comment by donovan....@gmail.com
on 8 Oct 2010 at 1:46
The http_requests branch has been fully merged to trunk in r49. Thanks everyone.
Original comment by donovan....@gmail.com
on 21 Oct 2010 at 3:26
Original issue reported on code.google.com by
liam.obo...@gmail.com
on 9 Sep 2010 at 1:25