xelkano / redmine_xapian

This plugin allows searches across attachments with xapian search engine
GNU General Public License v2.0
54 stars 26 forks source link

Error: not a Word Document #83

Closed tmc9031 closed 7 years ago

tmc9031 commented 7 years ago

Version: 1.6.6 cmd:

ruby /home/redmine/plugins/redmine_xapian/extra/xapian_indexer.rb -f -v

output:

Indexing "129124445033.doc" as application/msword ... added

but the other doc

Indexing "130124191044.doc" as application/msword ... /home/redmine/files/130124191044.doc is not a Word Document.
Skipping - "antiword -mUTF-8.txt /home/redmine/files/130124191044.doc" failed

What wrong ??? but i use "catdoc" can show 130124191044.doc text I want to know how redmine_xapian judge doc is a Word Document or not ???

picman commented 7 years ago

_xapianindexer.rb uses the command omindex. omindex then uses installed tools to convert different document types to plain text. According to the documentation:

MS Word documents (.dot) if antiword is available (.doc files are left to libmagic, as they may actually be RTF (AbiWord saves RTF when asked to save as .doc, and Microsoft Word quietly loads RTF files with a .doc extension), or plain-text).

libmagic library is used to convert *.doc documents not catdoc. See https://xapian.org/docs/omega/overview.html.