klmr92 / uguu

Automatically exported from code.google.com/p/uguu
Other
0 stars 1 forks source link

[further optimization idea] don't update tsfullpath immediately after each scan; #40

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Seems like updating index on tsfullpath is very long procedure, so removing
tsfullpath filling query and executing it by schedule (i.e., twice per day).

We can introduce spider.py command-line options:
--no-fullpaths: to prevent immediately updating 
--update-fullpaths: to update tsfullpath

Query "CREATE INDEX files_tsfullpath ON files USING gin(tsfullpath);" takes
about 220 seconds. So, when updating tsfullpath, we could increase
performance by dropping and recreating index (this requires uguuscript to
be the owner of table files, or saving uguu's password somewhere).

This optimization will give us a chance to run multiple spiders
simultaneously, but we should to use some protection during selecting shares.

Original issue reported on code.google.com by radist...@gmail.com on 23 Feb 2010 at 5:08

GoogleCodeExporter commented 9 years ago
doesn't make sense anymore

closed

Original comment by ruslan.savchenko on 4 Apr 2010 at 2:19