nortonandrews / kikoeru

Self-hosted web media player for DLsite voice works.
GNU General Public License v3.0
140 stars 15 forks source link

performScan runs too many instances of processFolder in parallel #5

Closed NANELLON closed 5 years ago

NANELLON commented 5 years ago

When running the scanner on folders with several hundred works, it will run for a minute and then fail with error

! ERROR while performing scan: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?

This seems to be because each call to processFolder requires a connection to the database, Knex has a default pool size of 10 connections, and a default timeout of 60 seconds, so almost immediately there will be instances of processFolder waiting to connect because the pool is full. With a big enough folder some of the connections end up waiting more then 60 seconds so it times out and throws the error.

Needless to say this prevents the client working while a scan is running too, not as much of an issue but could be improved.

Another issue is with up to 10 instances of processFolder retrieving data from HVDB at the same time, it seems to put a decent load on the server. Loading HVDB pages while a scan is running is a bit slower than usual and connections may end up timing out.

Recommend just processing each work synchronously, it'll be slower for sure but it'll make the process more reliable. If it's too slow could run 5 or so in parallel.

This issue may also affect performCleanup if many works have been removed

NANELLON commented 5 years ago

Tested synchronous processing, it's way too slow so perhaps being able to run the processFolder calls in batches would be good. I vaguely remember you or someone else mentioning that they ran the scanner over 200 works at a time and it worked but I think 8 would be worth a try as it would leave a couple of connections free for the server if it's running at the same time as the scan.

NANELLON commented 5 years ago

Performance testing with 100 works, no database, no images folder

Test Time
async (master) 19s
sync 180s
limited async (8 parallel) 23s

So limited async is acceptable. I ran this version over a folder with 2617 works and it completed with zero errors. It took ~10 minutes, which I'm ok with considering it might have taken a similar amount of time if I had to restart the scanner multiple times in the process.

phantasmx commented 5 years ago

Can confirm, this fix works. Scanned 1888 works twice after deleting db and images, no errors, about 5 min each scan.

nortonandrews commented 5 years ago

Thanks NAN! Took me a while to be able to test this but everything works correctly. Fix is already included in the newest release.