Closed GerHobbelt closed 5 years ago
From #13
16: Qiqqa failed on several occasions with my large PDF collection, causing a permanent and total failure in its search feature, i.e. the Lucene database got nuked/b0rked. All subsequent searches in Qiqqa would deliver ZERO results, quickly.
Reindexing via the Qiqqa Tools panel would have no effect.
Tools > Qiqqa Configuration > Troubleshooting > Rebuild Library Search Indices
Manually deleting all the Lucene DB files in
base/Guest/index/
would also be to no avail.Reconstructing the Library by importing the PDF files in tiny batches via the Directory Watch feature of Qiqqa would result in 'semi-random behaviour': it now turns out to be highly dependent on which PDF files got loaded first: as soon as an offending PDF (to be uploaded later) got included in the library, the Lucene-backed search facility would break down and stop to function.
Note: Pending investigation suspects #11 at least; at the time of this writing #11 has been fixed and this was a required first step towards making the Lucene-backed search feature work and (re)generate a working search index once again.
Done as per #33.
Commits:
Revision: d58bd7aed030e17361752ce539373aad68e8f973
revert debug code that was part of commit SHA-1: 89307edfe7d5ba2b6de050de969d2910b147e682 -- some invalid BibTeX was crashing the Lucene indexer (AddDocumentMetadata_BibTex()
would b0rk on a NULL Key
)
That problem was fixed in that commit at a higher level (in PDFDocument)
Revision: 89307edfe7d5ba2b6de050de969d2910b147e682
some invalid BibTeX was crashing the Lucene indexer (AddDocumentMetadata_BibTex()
would b0rk on a NULL Key
)
Sample invalid BibTeX:
@empty = delete?
Revision: 8a1d7660659079939e59be74bf3822ea6311a205 Fix https://github.com/jimmejardine/qiqqa-open-source/issues/17 by processing PDFs in any Qiqqa library in small batches so that Qiqqa is not unreponsive for a loooooooooooooong time when it is re-indexing/upgrading/whatever a large library, e.g. 20K+ PDF files. The key here is to make the 'infrequent background task' produce some result quickly (like a working, yet incomplete, Lucene search index DB!) and then updating/augmenting that result as time goes by. This way, we can recover a search index for larger Qiqqa libraries!
Closing and decluttering the issue list so it stays workable for me: fixed in https://github.com/GerHobbelt/qiqqa-open-source mainline=master branch, pending #15 / any maintainer rights/actions.
I've had this problem many times over the years with a 20K+ docs db. (using v76-80 (github release))