Closed remram44 closed 12 years ago
Giving up the UPDATE approach, it is possible to do bulk DELETE's and INSERT's at the end of the indexing.
There are only 3 requests: a SELECT on all the files, a DELETE for the missing/modified files and an INSERT for the added/modified files.
All the files of the server are loaded into Python prior to walking the FTP, and removed as they are found on the server, meaning more memory consumption and a lot of dict lookups. A modified file gets DELETE'd and INSERT'ed again, which means its primary key changes.
Because the files are held into memory until the requests are fully populated, the 'old' field is no longer needed.
Implemented in 910e2efbe0930dbc8eb3150090c09089e3b98066
Right now the number of requests is very high (1~2 per file!), causing very long indexing times.