hydrusnetwork / hydrus

A personal booru-style media tagger that can import files and tags from your hard drive and popular websites. Content can be shared with other users via user-run servers.
http://hydrusnetwork.github.io/hydrus/
Other
2.41k stars 160 forks source link

restart and losing some data #816

Closed alidan closed 3 years ago

alidan commented 3 years ago

I did a massive download from gelbooru, and just finished this morning, so as hydrus was eating 18gb of ram I restarted and let it do some maintenance, when it started back up it lost the last 6 days of watchers and progress the massive download did, it currently seems like the files are still there and it sees them/knows of them in the db, it knows where the files came from, and there is no issue seeing them, just the progress and last 6 days are gone from a watcher perspective

Environment

Hydrus version:428 (held off on upgrading till after the download was done) Platform: win 7 64bit OS specifics:win 7 64bit

installer

What happens

I did a massive download from gelbooru, and just finished this morning, so as hydrus was eating 18gb of ram I restarted and let it do some maintenance, when it started back up it lost the last 6 days of watchers and progress the massive download did, it currently seems like the files are still there and it sees them/knows of them in the db, it knows where the files came from, and there is no issue seeing them, just the progress and last 6 days are gone from a watcher perspective

Steps to reproduce

1.restart and loose watchers/gallery data? 2. 3.

Issues without reproduction steps are likely to stall.

Desired outcome

any indication on why this happened? or what to do to avoid losing the watchers

Error message / Log file / Screenshots

um... just looked at my log file for this month... its 3.2gb and I cant open to even look at what may be wrong client_2021-02-21_12-34-55 is a screenshot for a bit more info.

alidan commented 3 years ago

to add, I have about a 2-3000 file discrepancy between thumbnails and what the program displays with how boned am I, I am assuming this has to do with a database mess up a few months back where I had to load from a backup and never got around to doing a file find.

so good news, the files are all there seemingly, the bad, watcher and galleries are kind of screwed, I put some watchers in one of them and i'm going to shut down and restart once it gets the files from those, rename the 2021 2 so I get a new one so if there are errors going forward I can see them.

alidan commented 3 years ago

shutting down caused the log to jump to 4.2gb, created a new one and once it cycles on and I can see if anything remembered I added it, i will shut down again and get hopefully a smaller log to actually be openable.

alidan commented 3 years ago

it seems to have not remembered anything, tried shutting down and it decided to hang indefinitely, killed it opened again, this time I tried to save a pop session to see if I can do that in the case I need to make a relatively clean slate session, shut down, the log didn't freak out this time, going to add 1 watcher let it finish, and see if it freaks out on shutdown, regardless going to update to newest version after that and see if the issue persists

If it does and i'm able to make a saved session i'm going to trim everything down to a relative minimum and see if I can go from there.

SO FAR on opening it this time, it remembered a name change to a tab, and it remembered a rating change, so small bonuses, it also saved the test saved session.

alidan commented 3 years ago

ok, restart 1 - lost between 5 and 6 days of file acquisition and had a 3.2gb log restart 2 - lost some progress on getting everything up to speed and got another gb added to the log restart 3 - failed to restart so forced kill process restart 4 - it remembered a saved session rating and name change restart 5 - seems to remember what's in watchers

its possible the program fixed itself, now shutting it down to update, after update going to add a sizeable amount of threads to one of the watchers, and see what happens from there.

alidan commented 3 years ago

Ok I went to sleep, woke up, and had a 1gb log file

I think I know whats causing it may be gelbooru this seems to be directly related to https://github.com/hydrusnetwork/hydrus/issues/812 problem I had im not able to open the log file.

in shutting down the program the log bloated to 2gb im attempting to zip it and ill upload it somewhere, its a 350~ mb file and decompresses to 1.8-1.9gb https://mega.nz/file/S8pHHaCT#6NIbHTVIBv2ZHHS2mo_2Vvm6KYbrSSs-PyD5lQPHTIg

if that fails, here is a trackback from what hydrus would tell me,


DBException InterfaceError: Error binding parameter 4 - probably unsupported type. Traceback (most recent call last): File "hydrus\core\HydrusThreading.py", line 401, in run callable( *args, kwargs ) File "hydrus\client\gui\ClientGUI.py", line 4576, in do_it controller.SaveGUISession( session ) File "hydrus\client\ClientController.py", line 1596, in SaveGUISession self.WriteSynchronous( 'serialisable', session ) File "hydrus\core\HydrusController.py", line 863, in WriteSynchronous return self._Write( action, True, *args, *kwargs ) File "hydrus\core\HydrusController.py", line 247, in _Write result = self.db.Write( action, synchronous, args, kwargs ) File "hydrus\core\HydrusDB.py", line 1088, in Write if synchronous: return job.GetResult() File "hydrus\core\HydrusData.py", line 1855, in GetResult raise e hydrus.core.HydrusExceptions.DBException: InterfaceError: Error binding parameter 4 - probably unsupported type. Database Traceback (most recent call last): File "hydrus\core\HydrusDB.py", line 668, in _ProcessJob result = self._Write( action, *args, *kwargs ) File "hydrus\client\db\ClientDB.py", line 20529, in _Write elif action == 'serialisable': self._SetJSONDump( args, **kwargs ) File "hydrus\client\db\ClientDB.py", line 16742, in _SetJSONDump self._c.execute( 'INSERT INTO json_dumps_named ( dump_type, dump_name, version, timestamp, dump ) VALUES ( ?, ?, ?, ?, ? );', ( dump_type, dump_name, version, object_timestamp, dump_buffer ) ) sqlite3.InterfaceError: Error binding parameter 4 - probably unsupported type.

Database Traceback (most recent call last): File "hydrus\core\HydrusDB.py", line 668, in _ProcessJob result = self._Write( action, *args, *kwargs ) File "hydrus\client\db\ClientDB.py", line 20529, in _Write elif action == 'serialisable': self._SetJSONDump( args, **kwargs ) File "hydrus\client\db\ClientDB.py", line 16742, in _SetJSONDump self._c.execute( 'INSERT INTO json_dumps_named ( dump_type, dump_name, version, timestamp, dump ) VALUES ( ?, ?, ?, ?, ? );', ( dump_type, dump_name, version, object_timestamp, dump_buffer ) ) sqlite3.InterfaceError: Error binding parameter 4 - probably unsupported type.


I should note I have upgraded to client 429

It seems like the watchers got left alone for now, i'm not 100% sure what the gallery is doing, if it got everything sorted out or if it lost data, I think I lost some time on auto showing importers, but I lost none of the files, they are all still registered in the db

for not i'm pausing everything in gelbooru as I think that may be the cause. going to load up the other watchers and see if they report an error over the next 24 hours

-small edit I forgot to paste the trackback

Zweibach commented 3 years ago

If you're interested in having a look at the logs yourself there's specialised tools for reading large text files.

The integrated Text-Viewer of Total Commander can open huge files (>10GB) for viewing without any problems. It also provides different views, e.g. a Hex-View.

Total Commander can supposedly handle pretty big ones and there's a bunch others I found when googling "tool for reading large log files".

alidan commented 3 years ago

ok, looked at it... I have no idea what the hell its even doing, the gelbooru looks like it was a few corrupted images, and from there the log just kind of flips out and does god knows what with it.

from the 4gb file 2021/02/15 21:14:46: QBackingStore::endPaint() called with active painter; did you forget to destroy it or call QPainter::end() on it?

thats just going on and on a few errors, and then the full freak out, i think im going to compress that one too and upload it.

Zweibach commented 3 years ago

Huh, sounds like it might be the same error as #813

alidan commented 3 years ago

possibly the first log screams this is probably an issue, however im not 100% sure if the subsequent log does, but then again I may just not have noticed the painter thing because it wasn't 10k+ lines of it.

alidan commented 3 years ago

https://mega.nz/file/3xRBEYDL#qIycvvDVcDWyCTHSxQR4yhrxsTLq2pT4Q5s44Bc7nw0

there is the first 4gb log

so far nothing new, new log has entries but they look like a corrupt image and cloudflare stuff.

alidan commented 3 years ago

welp, it flipped out again, this time with nothing touching gel, going to just add watcher that I deem necessary for now.

alidan commented 3 years ago

client_2021-02-22_17-11-31

DBException InterfaceError: Error binding parameter 4 - probably unsupported type. Traceback (most recent call last): File "hydrus\core\HydrusThreading.py", line 401, in run callable( *args, kwargs ) File "hydrus\client\gui\ClientGUI.py", line 4576, in do_it controller.SaveGUISession( session ) File "hydrus\client\ClientController.py", line 1596, in SaveGUISession self.WriteSynchronous( 'serialisable', session ) File "hydrus\core\HydrusController.py", line 863, in WriteSynchronous return self._Write( action, True, *args, *kwargs ) File "hydrus\core\HydrusController.py", line 247, in _Write result = self.db.Write( action, synchronous, args, kwargs ) File "hydrus\core\HydrusDB.py", line 1088, in Write if synchronous: return job.GetResult() File "hydrus\core\HydrusData.py", line 1855, in GetResult raise e hydrus.core.HydrusExceptions.DBException: InterfaceError: Error binding parameter 4 - probably unsupported type. Database Traceback (most recent call last): File "hydrus\core\HydrusDB.py", line 668, in _ProcessJob result = self._Write( action, *args, *kwargs ) File "hydrus\client\db\ClientDB.py", line 20529, in _Write elif action == 'serialisable': self._SetJSONDump( args, **kwargs ) File "hydrus\client\db\ClientDB.py", line 16742, in _SetJSONDump self._c.execute( 'INSERT INTO json_dumps_named ( dump_type, dump_name, version, timestamp, dump ) VALUES ( ?, ?, ?, ?, ? );', ( dump_type, dump_name, version, object_timestamp, dump_buffer ) ) sqlite3.InterfaceError: Error binding parameter 4 - probably unsupported type.

Database Traceback (most recent call last): File "hydrus\core\HydrusDB.py", line 668, in _ProcessJob result = self._Write( action, *args, *kwargs ) File "hydrus\client\db\ClientDB.py", line 20529, in _Write elif action == 'serialisable': self._SetJSONDump( args, **kwargs ) File "hydrus\client\db\ClientDB.py", line 16742, in _SetJSONDump self._c.execute( 'INSERT INTO json_dumps_named ( dump_type, dump_name, version, timestamp, dump ) VALUES ( ?, ?, ?, ?, ? );', ( dump_type, dump_name, version, object_timestamp, dump_buffer ) ) sqlite3.InterfaceError: Error binding parameter 4 - probably unsupported type.

this seems to be the error that then jsut causes the log to freak out, not going to bother restarting, want to see if it keeps haveing this error, will just write off any watcher as a loss for tracking

alidan commented 3 years ago

as of now, it seems like there is only 1 freak out per instance of it happening, it seems like regardless of what I do, the program will now function as normal, but if I restart, anything that happened after the freak out userside/uiwise is lost, when the new version comes out and I upgrade I will save a pop and see if the saved pop is remembered or not.

alidan commented 3 years ago

small update, I just checked my subscription/s (I only have one going) and it finds files every single day, this sub DOES remember everything it has done over the last days since the error first showed up

alidan commented 3 years ago

Ok, shut down, it recorded another gb to the log, and I updated, and reopened the program. it lost all the watchers since the error, however I made a page of pages and saved it before I closed the program this was not lost and was able to be reopened.

alidan commented 3 years ago

client_2021-02-25_16-40-42

DBException InterfaceError: Error binding parameter 4 - probably unsupported type. Traceback (most recent call last): File "hydrus\core\HydrusThreading.py", line 401, in run callable( *args, kwargs ) File "hydrus\client\gui\ClientGUI.py", line 4576, in do_it controller.SaveGUISession( session ) File "hydrus\client\ClientController.py", line 1596, in SaveGUISession self.WriteSynchronous( 'serialisable', session ) File "hydrus\core\HydrusController.py", line 863, in WriteSynchronous return self._Write( action, True, *args, *kwargs ) File "hydrus\core\HydrusController.py", line 247, in _Write result = self.db.Write( action, synchronous, args, kwargs ) File "hydrus\core\HydrusDB.py", line 1088, in Write if synchronous: return job.GetResult() File "hydrus\core\HydrusData.py", line 1855, in GetResult raise e hydrus.core.HydrusExceptions.DBException: InterfaceError: Error binding parameter 4 - probably unsupported type. Database Traceback (most recent call last): File "hydrus\core\HydrusDB.py", line 668, in _ProcessJob result = self._Write( action, *args, *kwargs ) File "hydrus\client\db\ClientDB.py", line 20529, in _Write elif action == 'serialisable': self._SetJSONDump( args, **kwargs ) File "hydrus\client\db\ClientDB.py", line 16742, in _SetJSONDump self._c.execute( 'INSERT INTO json_dumps_named ( dump_type, dump_name, version, timestamp, dump ) VALUES ( ?, ?, ?, ?, ? );', ( dump_type, dump_name, version, object_timestamp, dump_buffer ) ) sqlite3.InterfaceError: Error binding parameter 4 - probably unsupported type.

Database Traceback (most recent call last): File "hydrus\core\HydrusDB.py", line 668, in _ProcessJob result = self._Write( action, *args, *kwargs ) File "hydrus\client\db\ClientDB.py", line 20529, in _Write elif action == 'serialisable': self._SetJSONDump( args, **kwargs ) File "hydrus\client\db\ClientDB.py", line 16742, in _SetJSONDump self._c.execute( 'INSERT INTO json_dumps_named ( dump_type, dump_name, version, timestamp, dump ) VALUES ( ?, ?, ?, ?, ? );', ( dump_type, dump_name, version, object_timestamp, dump_buffer ) ) sqlite3.InterfaceError: Error binding parameter 4 - probably unsupported type.

got it again, not long after I opened the program, but was to tired to make an update.

alidan commented 3 years ago

client_2021-02-26_08-44-50

DBException InterfaceError: Error binding parameter 4 - probably unsupported type. Traceback (most recent call last): File "hydrus\core\HydrusThreading.py", line 401, in run callable( *args, kwargs ) File "hydrus\client\gui\ClientGUI.py", line 4576, in do_it controller.SaveGUISession( session ) File "hydrus\client\ClientController.py", line 1596, in SaveGUISession self.WriteSynchronous( 'serialisable', session ) File "hydrus\core\HydrusController.py", line 863, in WriteSynchronous return self._Write( action, True, *args, *kwargs ) File "hydrus\core\HydrusController.py", line 247, in _Write result = self.db.Write( action, synchronous, args, kwargs ) File "hydrus\core\HydrusDB.py", line 1088, in Write if synchronous: return job.GetResult() File "hydrus\core\HydrusData.py", line 1855, in GetResult raise e hydrus.core.HydrusExceptions.DBException: InterfaceError: Error binding parameter 4 - probably unsupported type. Database Traceback (most recent call last): File "hydrus\core\HydrusDB.py", line 668, in _ProcessJob result = self._Write( action, *args, *kwargs ) File "hydrus\client\db\ClientDB.py", line 20529, in _Write elif action == 'serialisable': self._SetJSONDump( args, **kwargs ) File "hydrus\client\db\ClientDB.py", line 16742, in _SetJSONDump self._c.execute( 'INSERT INTO json_dumps_named ( dump_type, dump_name, version, timestamp, dump ) VALUES ( ?, ?, ?, ?, ? );', ( dump_type, dump_name, version, object_timestamp, dump_buffer ) ) sqlite3.InterfaceError: Error binding parameter 4 - probably unsupported type.

Database Traceback (most recent call last): File "hydrus\core\HydrusDB.py", line 668, in _ProcessJob result = self._Write( action, *args, *kwargs ) File "hydrus\client\db\ClientDB.py", line 20529, in _Write elif action == 'serialisable': self._SetJSONDump( args, **kwargs ) File "hydrus\client\db\ClientDB.py", line 16742, in _SetJSONDump self._c.execute( 'INSERT INTO json_dumps_named ( dump_type, dump_name, version, timestamp, dump ) VALUES ( ?, ?, ?, ?, ? );', ( dump_type, dump_name, version, object_timestamp, dump_buffer ) ) sqlite3.InterfaceError: Error binding parameter 4 - probably unsupported type.

restarted to make sure I could do something and the data wouldn't be lost

hydrusnetwork commented 3 years ago

Thank you for this comprehensive report. I will read this through properly this week and see what I can do to improve or fix the situation.

Immediately one thing comes to mind, which may be related to your large memory use and that specific database error--is your client's 'session' very large? That 'massive' download, could it have had millions of URLs? I believe there is a limit of around 1 or 2GB for objects I save into the database, so if your session exceeded this size, I think that is the ugly error given. I was under the impression that a routine now caught this situation and gave you a nicer 'slim down your session now!' error, but perhaps that is not working.

alidan commented 3 years ago

the download was entirely in gallery's and was about 1tb in size if I remember the given download amounts, I think I updated the client somewhere in the beginning of it and restarted it again somewhere else in the middle. it was essentially every tag I liked on gel that I was not able to en mass download before, as for client weight, its going on 2.9 million with about 300~k images open, I got very lazy in dealing with them.

so this is most likely caused by the large client weight and dealing with that should stop it? because if so I can expedite a test and remove a good chunk of watchers

hydrusnetwork commented 3 years ago

I believe so. There is supposed to be a thing that pops up when you hit the limit, perhaps this got lost in a wave of different error popups. I will investigate the whole issue and see about better safeguards.

There's a feature that may help you here: if on a downloader you right-click the little icon button that opens the file import status list window, it should make a menu on that right-click with an option like 'clear out completed items'. This helps you de-bloat a gigantic import queue that has had some work.

alidan commented 3 years ago

I like to keep completed downloads for a while so I can sift through them in thread/tag I wanted form

for future reference, at what session size does the program get iffy? I know it use to get crash prone when it went over 165 tabs, but this is the first time I came across session weight being an issue for more then just locking the client up for a few seconds every now and then.

alidan commented 3 years ago

ok, got most of the thread watchers trimmed down and left some smaller ones, this has allowed the program to function, but not before tossing another and hopefully its final 1gb log dump when I tried to save the entire session before any cull, will hopefully be able to get galleries dealt with and the mass amount of opened images soonish, I am making an effort now to deal with them sooner than later.

hydrusnetwork commented 3 years ago

@alidan I generally recommend people keep session below 150,000 weight as under the pages menu. Bigger than that, and you can start to get some judder (judder as in the UI starts to lock up from background CPU work on all the items in memory). The actual strict limit is about 500,000-1,300,000 I believe, depending on the type of weight in the session.

As far as I know, the real limit being hit here is 'session object serialises to >1GB of data', which hits a SQLite limit. However, I have looked over my code, and I am fairly confident you would have hit my code that catches this, so I am uncertain exactly what was going on here. You are on my Windows installer, so I think you should be hitting the same size limit, yet with a weight of about 2.9 million, you should have regularly had sessions larger than 1 GB.

Perhaps my understanding here is wrong. If the URLs were mostly watcher URLs, they would likely have less data than a richer booru URL with many tags and so on, so perhaps they serialise to less than ~370 bytes each (1GB/2.9M). Perhaps also, then, the limit is not 1GB but actually practically 1000MB or similar, and your session was hovering around the limit.

I have improved the error handling code here to say more about the error. Until I can have more granular session saving, I will increase warnings in the program around large sessions going forward. Please continue to let me know how you get on.

hydrusnetwork commented 3 years ago

Damn, this appears to be correct; max size of a 'blob' is 1,000,000,000 bytes, not 1GB (1,073,741,824).

https://sqlite.org/limits.html

Your session must have grown slowly enough to be in that 7% gap. I have fixed this limit check. Thank you again for this report, I will continue to explore better handling and reporting in this area.

alidan commented 3 years ago

if you are up for suggestions, a warning at 25% that size is becoming an issue and then a 'get this sorted out now' warning at 10% along with an explanation as to what contributes to getting that size so big and what will help most to make it smaller.

Like I said, I like keeping threads and galleries around because parsing them in context of threads makes things easier to parse through then random searches just because I want to say the majority of files I have have 0 tags, I can't reliably even sort by tags. take for instance the 'chose your own adventure' threads for an example, none of these are tagged in any way, outside of me adding a 'cyoa' tag to them, they need to be bunched up in context so I can add name and version number tags to them.

for me im currently just saving the watchers and gallery as a page of pages so I can open them up and parse though, then save them off sans what was parsed through. some of these were easier/priorities some were significantly harder due to reasons I have addressed now. what would make this perfect at least in my use case was if there was a way to save off the watchers to be recalled later in a way that's not as taxing as just reloading a 3000~ watcher heavy page of pages, but at the same time I have no idea how many people have remotely the same issues.

as for judder I get a solid 20-50 images before it hangs for a few seconds, even when the program was breaking it handles it FAR better than it use to, so while it may not be clean judder free, it's still far more useful than it was in the passed.

but now that I know the program breaks because of a very heavy session, I kind of want to start a clean one and load up some animations and see if my issues with mpv crashing the program is caused by a heavy load or if its just mpv crashing it regardless.

yea, im going through the process of getting that gallery sorted out, and finding the images is kind of eating session weight, much less 'acquiring' them granted that may be an issue caused most of the images are already in the archive now. I believe my largest thread watcher which was 3000~ watchers when it closed gave me back 500-700k weight.

but yea, I had a suspicion it may have been my stupidly large session, but was not sure and didn't want to do anything while things were not saving, you see the error above 'probably unsupported type' if that had kicked back at any point 'hey your session is to large to save, deal with it' I would have known and been able to trim immediately