remram44 / yoppi

An automatic FTP indexer written in Python. Inspired by Yoshi Indexer (written in PHP).
GNU General Public License v3.0
6 stars 2 forks source link

Optimize FTP indexing #3

Closed remram44 closed 12 years ago

remram44 commented 12 years ago

Right now the number of requests is very high (1~2 per file!), causing very long indexing times.

remram44 commented 12 years ago

Giving up the UPDATE approach, it is possible to do bulk DELETE's and INSERT's at the end of the indexing.

There are only 3 requests: a SELECT on all the files, a DELETE for the missing/modified files and an INSERT for the added/modified files.

All the files of the server are loaded into Python prior to walking the FTP, and removed as they are found on the server, meaning more memory consumption and a lot of dict lookups. A modified file gets DELETE'd and INSERT'ed again, which means its primary key changes.

Because the files are held into memory until the requests are fully populated, the 'old' field is no longer needed.

remram44 commented 12 years ago

Implemented in 910e2efbe0930dbc8eb3150090c09089e3b98066