jimmejardine / qiqqa-open-source

The open-sourced version of the award-winning Qiqqa research management tool for Windows
GNU General Public License v3.0
366 stars 60 forks source link

qiqqa crashing when generating autotags #283

Open quissicks opened 3 years ago

quissicks commented 3 years ago

Happy New Year to the qiqqa community! I have a very large library. I am running version V82.0.7579.33985. It crashed when I try to generate autotags.

GerHobbelt commented 3 years ago

Happy & healthy new year!

Re issue: Much appreciated if you can send the logfiles.

Did the crash happen again after restart of the application and regenerating the autotags, i.e. is the application crashing consistently ?

On Sun, Jan 3, 2021, 17:23 quissicks notifications@github.com wrote:

Happy New Year to the qiqqa community! I have a very large library. I am running the most recent version posted by Ger. It crashed when I try to generate autotags.

β€” You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jimmejardine/qiqqa-open-source/issues/283, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADCIHXGBCDP6EAXSQGR3P3SYCKYBANCNFSM4VR7W5UQ .

GerHobbelt commented 3 years ago

Hi Chris,

Finally took time to inspect your logfiles earlier today. Still going through them as there's some other stuff in there that hints of other trouble. Anyway, we'll get to that.

What I can see for the logfiles, the root cause is (with high probability) the auto tag processing (and not something happening in the background that "just happens at the same time"). The outofmem failure happens inside the LuceneNET library code as this library code is busy updating the search index with the new autotags which are attached to each document. (The LuceneNET search index processes all PDF document texts plus all PDF text-based metadata (tags, BibTeX, title, etc.)

Thank you very much for sending the bundled logfiles; I'll bother you with a few more requests if that's okay:

Aside

it's not related to this issue but I noticed a bunch of PDFs producing 'irregular' log output during OCR/text background processing for the search index updates, which translates to:

I'd like to have a look at those PDFs when time allows, if that's okay.


Back to the issue at hand

The short end of the problem at hand is that I don't have a quick fix for it right now.

Memory management in .NET applications isn't easy stuff; I'm considering how to tackle this sooner than my intended end result: Qiqqa in 64 bit with upgraded libraries. (#289, section "How much .NET memory is gobbled up by the Lucene search databases in current Qiqqa?")

From what I can see so far is the problem is caused by all the LuceneNET activity resulting from the set of AutoTags discovered and assigned to the documents. πŸ€” Thinking about how to approach this problem and reduce the memory pressure in the application.

Current questions for you (@quissicks)

GerHobbelt commented 3 years ago

@quissicks : Hi Chris,

There's a new (test) release published at https://github.com/GerHobbelt/qiqqa-open-source/releases/tag/v83.0.7649.30836 ; see description there. You can simply install it over your existing Qiqqa; if you want to revert to another Qiqqa version, you can install that version over the new one without trouble.

See also the last comment at #288 (the other issue this release is targetting) and the screenshot of the startup dialog there: not meant for your situation, so only an awareness bit. In your case I'm particularly interested in the new log files; regrettably I haven't been able to do something seriously about memory reduction yet: I have a few observations, also from my own testing, but it's pretty tough to pinpoint the culprits (well, technically more accurate is saying the culprits are easily found in a memory profiler but the big hurdle is coming up with ways to alleviate the memory pressure there: it's all the documents, which load their metadata into memory at the first "opportunity" where such is needed (e.g. when analyzing metadata in the background for auto-tagging, checking the indexing, etc.etc.) and then Qiqqa isn't smart about it and doesn't know how to, say, "throw away" these datums when the acute need for them has gone. Plus there's the curious observation in my own tests that 'apparently' there are more PDF document 'instances' in memory than I have PDF documents in all the libraries, so that's another ho-hum-hum to research: that one has to be tested with a very small library (or set of libraries) to see if I can reproduce that 'too many' situation then and find out where it originates -- doing that in a huge lib is a too cumbersome.

Anyway, just so you get a bit of feel for what's seen and know that work is being done, only I cannot predict results yet as I'm still in the 'finding out what's going exactly phase, while also realizing that there's some serious refactoring required if I must detect high memory pressure and 'discard' old-ish metadata -- which isn't timestamped yet as these are all persistent stores, not 'caches' in the usual sense, where stuff comes in, gets a timestamp that's tracked and refreshed based on usage and then killed off when the cached stuff 'expires'.

No matter, ignore if that's too geeky for you πŸ˜…

Have a go at the new version if you like and I'ld be happy to see another set of logfiles. Thanks!

By The Way

Apologies for any 'rough edges' with the new one; pushed the release out so it's here today and not, say, friday or later. Real life and all that jazz. Ciao!

quissicks commented 3 years ago

Dear Ger, Thanks for this. I have now installed the new version. I will send you logs if there are any crashes. I really welcome your commentary. I used to do a lot of computing myself – I used to administer a cluster of Sun workstations and I did a lot of programming (I developed a very large simulation model for my PhD, which was coded from scratch). However, I am very out of date – I haven’t done much since 2006 when the Sun compilers were withdrawn (and porting to something else would have been a huge task). At some stage I must get round to programming in some contemporary language.

Thanks again, Chris.

From: Ger Hobbelt notifications@github.com Sent: 13 January 2021 02:09 To: jimmejardine/qiqqa-open-source qiqqa-open-source@noreply.github.com Cc: Chris Hicks chris.hicks@newcastle.ac.uk; Mention mention@noreply.github.com Subject: Re: [jimmejardine/qiqqa-open-source] qiqqa crashing when generating autotags (#283)

⚠ External sender. Take care when opening links or attachments. Do not provide your login details.

@quissickshttps://github.com/quissicks : Hi Chris,

There's a new (test) release published at https://github.com/GerHobbelt/qiqqa-open-source/releases/tag/v83.0.7649.30836 ; see description there. You can simply install it over your existing Qiqqa; if you want to revert to another Qiqqa version, you can install that version over the new one without trouble.

See also the last comment at #288https://github.com/jimmejardine/qiqqa-open-source/issues/288 (the other issue this release is targetting) and the screenshot of the startup dialog there: not meant for your situation, so only an awareness bit. In your case I'm particularly interested in the new log files; regrettably I haven't been able to do something seriously about memory reduction yet: I have a few observations, also from my own testing, but it's pretty tough to pinpoint the culprits (well, technically more accurate is saying the culprits are easily found in a memory profiler but the big hurdle is coming up with ways to alleviate the memory pressure there: it's all the documents, which load their metadata into memory at the first "opportunity" where such is needed (e.g. when analyzing metadata in the background for auto-tagging, checking the indexing, etc.etc.) and then Qiqqa isn't smart about it and doesn't know how to, say, "throw away" these datums when the acute need for them has gone. Plus there's the curious observation in my own tests that 'apparently' there are more PDF document 'instances' in memory than I have PDF documents in all the libraries, so that's another ho-hum-hum to research: that one has to be tested with a very small library (or set of libraries) to see if I can reproduce that 'too many' situation then and find out where it originates -- doing that in a huge lib is a too cumbersome.

Anyway, just so you get a bit of feel for what's seen and know that work is being done, only I cannot predict results yet as I'm still in the 'finding out what's going exactly phase, while also realizing that there's some serious refactoring required if I must detect high memory pressure and 'discard' old-ish metadata -- which isn't timestamped yet as these are all persistent stores, not 'caches' in the usual sense, where stuff comes in, gets a timestamp that's tracked and refreshed based on usage and then killed off when the cached stuff 'expires'.

No matter, ignore if that's too geeky for you πŸ˜…

Have a go at the new version if you like and I'ld be happy to see another set of logfiles. Thanks!

By The Way

Apologies for any 'rough edges' with the new one; pushed the release out so it's here today and not, say, friday or later. Real life and all that jazz. Ciao!

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jimmejardine/qiqqa-open-source/issues/283#issuecomment-759155869, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGVCBDPM5BJEEAQTNH3TKNDSZT6EZANCNFSM4VR7W5UQ.

GerHobbelt commented 3 years ago

Quick heads up: new release to try: https://github.com/GerHobbelt/qiqqa-open-source/releases/tag/v83.0.7655.37537

Please report anything you observe with the new release. Thanks!

quissicks commented 3 years ago

Thanks Ger, I am just installing it now.

Best wishes, Chris.

From: Ger Hobbelt notifications@github.com Sent: 16 January 2021 22:00 To: jimmejardine/qiqqa-open-source qiqqa-open-source@noreply.github.com Cc: Chris Hicks chris.hicks@newcastle.ac.uk; Mention mention@noreply.github.com Subject: Re: [jimmejardine/qiqqa-open-source] qiqqa crashing when generating autotags (#283)

⚠ External sender. Take care when opening links or attachments. Do not provide your login details.

Quick heads up: new release to try: https://github.com/GerHobbelt/qiqqa-open-source/releases/tag/v83.0.7655.37537

Please report anything you observe with the new release. Thanks!

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jimmejardine/qiqqa-open-source/issues/283#issuecomment-761685991, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGVCBDOAMHOIVIVNTJRBHB3S2ID4HANCNFSM4VR7W5UQ.

GerHobbelt commented 3 years ago

Quick heads up: hotfix release to try: https://github.com/GerHobbelt/qiqqa-open-source/releases/tag/v83.0.7656.6401 (which fixes known issue in previous release https://github.com/GerHobbelt/qiqqa-open-source/releases/tag/v83.0.7655.37537)

Please report anything you observe with the new release. Thanks!