pombreda / djapian

Automatically exported from code.google.com/p/djapian
Other
0 stars 0 forks source link

Remove unnecessary (and even harmful) database.flush() from Indexer.update() #107

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I have a fairly large Xapian database (~7Gb) and I noticed that disk IO
consumption has grown to be unreasonably large lately.

Most of the time is spend in the call to database.flush() in
Indexer.update() on line 279 of indexer.py (r347)

According to Xapian docs
http://xapian.org/docs/apidoc/html/classXapian_1_1WritableDatabase.html#c4d8b73d
cc239e57343fa0c876ef8966
transaction created with begin_transaction(flush=True), implicitly calls
flush() before and after transaction, and Djapian creates transaction as
database.begin_transaction(flush=True) (see line 214 of indexer.py).

So it seems that removing database.flush() should be safe. 

According to my tests, removing database.flush() doesn't affect that data
is written to disk immediately - e.g. I can find a just indexed object
right after a transaction has been comitted(), even before the database
object was destroyed (which also causes database to flush).

Also removing this line greatly reduced disk IO on my system. I can't give
you a scientific benchmarks, but it made an improvement that I could see
with my own eyes.

I suggest removing that database.flush() as it's clearly not necessary, and
could be beneficial (and in my case it was).

Attached is the (trivial) patch to remove the offending database.flush()

Original issue reported on code.google.com by redvas...@gmail.com on 22 Feb 2010 at 3:29

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by daevaorn on 22 Feb 2010 at 4:35

GoogleCodeExporter commented 9 years ago
Fixed in r349

Original comment by daevaorn on 22 Feb 2010 at 5:46