henhenfauzi / text-mining

Automatically exported from code.google.com/p/text-mining
0 stars 0 forks source link

Text extraction from MS Word ignore some words at the end of the file #3

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I'm using Alfresco 3.0 Labs Stable that is using textmining library to
extract text from the MS Word files.
The convertion from MS Word to Plain Text reveals that the exported text is
prematurely ended ...
Also , lucene indexer does not find the words "matase" and "chineza" that
can be found in the last paragraph of the file.

I'm attaching the .DOC file

Original issue reported on code.google.com by braila...@gmail.com on 23 Jan 2009 at 1:38

Attachments: