Open serhaton opened 3 years ago
EDIT:
I was suspicious that maybe the problem is related to Windows running Lucene so I decided to install CouchDB and CouchDB-Lucene on Ubuntu 20.04 Server. But the result is same.
Everthing works fine until I upload a Docx or pptx document. But it works fine with doc, rtf, txt and pdf files.
I am really stuck with this problem
I am using couchdb-lucene 2.2.0 installed on Windows Server 2019. Couchdb version I am using is 3.1.1
Fulltext searching works fine with document properties. I also wanted to index based on the content of attachments of the documents. So I configured Design Document as follows
When I upload attachments to documents with type pdf, txt, word everything works fine as expected. Below is a search result of "Sesame Street" keyword in a ppt document and it works fine.
Then I upload any docx file ( even an empty one with only some plain text. For this specific problem my word docx contains 'This is an example document which I have indexing problem on couchdb-lucene' text only ) or pptx attachment to any of the documents and re-run the above request. If gives timeout error forever.
The log shows only below message
If I try to seach with 'problem' keyword which is in word document result is same timeout.
If I try with stale=ok it response with empty result.
So indexing is somehow stuck forever. restarting the Couchdb-Lucene does not change anything. If I delete the document with docx file from couchdb and after that if I restart couchdb-lucene everything starts working again.
I believe problem is related to zip format documents such as docx, xlsx and pptx etc.