renjithkathirolil / solr-php-client

Automatically exported from code.google.com/p/solr-php-client
Other
0 stars 0 forks source link

Very high memory usage on bulk insert of documents #20

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When adding a lot of documents (100.000+) to Solr in one go, in this case a
CLI script used for an initial import, the memory usage gets very high
(over 500M)

I could trace the memory usage to the function _sendRawPost in Service.php.
For every request a new stream context is created, using about as much
memory as the data in the request. With a big data set this starts to add
up very quickly.

I managed to solve the issue by reusing the same context for each request,
and modifing the options to suit the next request. This way the memory
usage remained very stable even for a 600.000 document run.

Original issue reported on code.google.com by raspberr...@gmail.com on 14 Oct 2009 at 12:34

GoogleCodeExporter commented 9 years ago
I was able to quickly verify what you report. Seems very unfortunate to me that 
there is no way to free the 
memory from the stream context resource. Using a new context seemed cleaner 
code wise, but its obviously not 
acceptable. I'll move to reusing a single context.

Original comment by donovan....@gmail.com on 19 Oct 2009 at 4:29

GoogleCodeExporter commented 9 years ago
Moved to reusing a get and post context instead of creating a new one for each 
request in r21

Original comment by donovan....@gmail.com on 9 Nov 2009 at 10:09

GoogleCodeExporter commented 9 years ago
further fix in r22 (wrong stream context function used to set options)

Original comment by donovan....@gmail.com on 9 Nov 2009 at 10:52

GoogleCodeExporter commented 9 years ago
I am having a rather large performance issue which I think is related to this.  
I am using the newest code however I think this is leaking.  I am trying to 
index between 6 and 10 million documents and even with a memory limit on php of 
4 G, I get to maybe 1 million before it eats up the memory.  I have tried doing 
this in chunks of 100,000, 10,000, and 1,000 and it all just dies and seems to 
be around this function.

Thoughts?  better approaches?

Original comment by ave...@gmail.com on 30 Aug 2010 at 3:51

GoogleCodeExporter commented 9 years ago
Are you using the SVN version of the code? it now reuses a context. If you are 
and are still seeing memory climb - then I'd check whether you're holding onto 
documents somewhere. If that still doesn't work, then you could try breaking 
the work into several processes.

Original comment by donovan....@gmail.com on 30 Aug 2010 at 4:05