Seems like updating index on tsfullpath is very long procedure, so removing
tsfullpath filling query and executing it by schedule (i.e., twice per day).
We can introduce spider.py command-line options:
--no-fullpaths: to prevent immediately updating
--update-fullpaths: to update tsfullpath
Query "CREATE INDEX files_tsfullpath ON files USING gin(tsfullpath);" takes
about 220 seconds. So, when updating tsfullpath, we could increase
performance by dropping and recreating index (this requires uguuscript to
be the owner of table files, or saving uguu's password somewhere).
This optimization will give us a chance to run multiple spiders
simultaneously, but we should to use some protection during selecting shares.
Original issue reported on code.google.com by radist...@gmail.com on 23 Feb 2010 at 5:08
Original issue reported on code.google.com by
radist...@gmail.com
on 23 Feb 2010 at 5:08