hydrusnetwork / hydrus

A personal booru-style media tagger that can import files and tags from your hard drive and popular websites. Content can be shared with other users via user-run servers.
http://hydrusnetwork.github.io/hydrus/
Other
2.33k stars 152 forks source link

crashing with many galleries downloading #1044

Open alidan opened 2 years ago

alidan commented 2 years ago

Hydrus version

v467

Operating system

Windows other (specify in comments)

Install method

Installer

Install and OS comments

windows 7, 64gb of ram, 1700 amd stock

Bug description and reproduction

hydrus has crashed for several versions now on mass gallery downloads, totaling between 92 and up to 390 queries.

i'm not sure what I may be doing other than mass downloading, however the program just stops working.

unless the program was slow crashing for 20+ hours, there is no point in the log that helps find out what caused the crash, and this happened with 10-15 million session weight lighter, so im not sure that is the cause of the crashing right now. for now, no data is being lost, it just seems the program gives up every now and then.

Log output

v467, 2022/01/08 04:35:33: Import folder The archive imported 96 files.
v467, 2022/01/08 08:58:18: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:58:18: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:58:18: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:58:18: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:58:18: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:58:18: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:58:56: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:58:56: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:58:56: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:58:56: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:58:56: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:58:56: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:59:50: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:59:50: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:59:50: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:59:50: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:59:50: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:59:50: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:59:50: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 08:59:50: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 09:00:44: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 09:01:11: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 14:10:55: database maintenance - analyzing

done!
v467, 2022/01/08 21:19:04: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 21:19:59: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/08 22:56:25: PIL\TiffImagePlugin.py:819: UserWarning: Corrupt EXIF data.  Expecting to read 4 bytes but only got 0. 
v467, 2022/01/09 00:49:09: hydrus client started
v467, 2022/01/09 00:49:10: booting controller…
v467, 2022/01/09 00:49:10: booting db…
v467, 2022/01/09 00:49:21: checking database
v467, 2022/01/09 00:49:26: preparing db caches
v467, 2022/01/09 00:49:26: initialising managers
v467, 2022/01/09 00:49:27: booting gui…
v467, 2022/01/09 00:49:27: starting services…
v467, 2022/01/09 00:49:28: Running "client api" on port 45869.
v467, 2022/01/09 00:49:28: services started
v467, 2022/01/09 00:51:14: Your session weight is 32,911,398, which is pretty big! To keep your UI lag-free, please try to close some pages or clear some finished downloaders!
v467, 2022/01/09 01:00:53: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError: 

v467, 2022/01/09 01:42:06: Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError:
hydrusnetwork commented 2 years ago

Hey, I am sorry for being late to getting back to this.

I am not totally sure what is going on here, beyond the rough idea that downloaders scale pretty badly right now. Each gallery you have doing work puts a couple of repeating jobs into the job pool, and while my network and downloader schedulers try to enforce a bottleneck to keep it so only 10 or 20 downloaders can actually be executing at once, adding hundreds or close to a thousand jobs that are mostly just skipping still seems to add some laggy overhead.

What I really need to do is rewrite some core downloader routines to be async, event-driven code, so no downloader needs its own thread to shepard its progress and we can just have one uber-manager handling it all on a single dedicated thread.

All that said, I am not sure why a crash is occurring. Most of the time when the job system gets overloaded, it just causes lag or longer lockup. I know some madlads who have pasted 500 query texts in to a page, and while it makes things run mega slow, they haven't reported crashing so much. Almost all crashes come for UI code being unhappy, so it sounds like the lag here is causing a delayed update event in UI or similar (e.g. maybe an interaction between the multi-column list showing these many downloaders working and them being slow to respond), and the program is running a bit of bad code at the wrong time.

Since diagnosing the causes are tricky, I am afraid I cannot easily solve this in the short term. Please use the client with only 20-40 downloaders running at once and ensure that the program doesn't crash. If we know that is stable, then we'll definitely know the problem here is in many downloaders and not something else. The 'twisted' stuff in your log here is probably unrelated, it is Client API stuff, I think probably your web browser closing some requests early.

alidan commented 2 years ago

I put about 1000 queries into the program when I did an absolutely massive download of tags and artists I liked, this only caused the program to no longer save the session, never outright crash.

that said, I do have a suggestion for an interim fix, have the gallery work as an order of operations, search files, than download, with a hard limit of only 2-3 queries at a time, this seems to stop any crashing for me, however that also may be because the program doesn't have enough time to crash... the last single large download I did where it had 6000 images and was the sole thing didn't have an issue, and when I import several hundred chan threads to download it also doesn't seem to have an issue, just galleries. it's possible due to a 150~ image limit on the chan (no idea if git has things in place to cause issues so using shortened terms) that the program never had an issue, but when it hit larger image limits of 1000+ it has issues, when I made this issue, I had just gotten done with a massive reddit subreddit download and just started on another booru, the reddit parser probably didn't cause the issue due to the 621 one also having issues with crashing on to many