jimmejardine / qiqqa-open-source

The open-sourced version of the award-winning Qiqqa research management tool for Windows
GNU General Public License v3.0
374 stars 61 forks source link

Migrate Qiqqa to 64 bit architecture to cope with large libraries, etc. (Future Plan) #289

Open GerHobbelt opened 3 years ago

GerHobbelt commented 3 years ago

Given #283 and a lot of other issues (haven't taken time to search issue database right now), this is a needed effort.

(TODO: edit this issue to include links to the relevant issue numbers below)

Unfortunately the libraries which keep us limited to 32bit .NET (and thus upper limit of ~ 1-1.5GB RAM usage) are both UI libraries: SORAX for PDF and XULrunner for the "embedded browser" used in the Qiqqa Sniffer (and a few other places in the software).

Another problem library is the old Lucene.NET we're still using.

The UI problems surface when using (very) large libraries and out-of-memory issues pop up ever so often.

Key idea developed during 2020 is to open up Qiqqa and split it up into separate components:


Before we go there, there's one thing on my mind that I haven't checked yet:

How much .NET memory is gobbled up by the Lucene search databases in current Qiqqa?

When you have a very large lib (40-50+K PDFs) I notice memory consumption quickly rising to ~ 1GB and then performance being reduced more or less (due to frequent GC (Garbage Collect) actions from .NET) and ultimately out-of-memory fatal errors when you're unlucky. (#283 f.e.)

What I must checck is: does it help significantly if I move the Lucene/Search Index work out of process? No need to immediately reach for SOLR there, but maybe I can come up with a minimal bit of work to arrive at a similar scenario (search engine as local server == out-of-process), where Qiqqa core app *communicates* with the search engine instead of incorporating* it...

GerHobbelt commented 3 years ago

Conclusion after tonight: Lucene.NET is out. Too much effort; can't mess with it without breaking. surely will be me and my ways or whatnot. I don't mind. My time is better spent on kicking up a real SOLR instance and kicking its tires, learning to get that one flying with Qiqqa. There's where I want to go with this whole endeavour anyway: opened up search access so folks can do their own creative processing of the PDF content and metadata fed into the engine by Qiqqa: Qiqqa shouldn't be the only channel into your metadata.

Thinking about #261 and other 'complexities' here.

GerHobbelt commented 3 years ago

Related: #23