hobama / text-mining

Automatically exported from code.google.com/p/text-mining
0 stars 0 forks source link

Word 2 files don't work with WordTextExtractorFactory #6

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. create a test method in TestWord2TextExtraction class

public void testWord2ExtractorFromFactory() throws Exception {
      String fileName = "./test/text/winword2/simple.doc";
      String text = "This is a simple test of text extraction for Word
documents."; 
      FileInputStream in = new FileInputStream(fileName);
      WordTextExtractorFactory fac = new WordTextExtractorFactory();
      TextExtractor extractor = fac.textExtractor(in);
      String testText = extractor.getText().trim();
      assertEquals(testText, text.trim());
  }

2. run the test method
3. see the exception

What is the expected output? What do you see instead?
The expected output is that the test should be extracted

What version of the product are you using? On what operating system?
I use the 1.0 release with java 1.6.0_14 under ubuntu linux.

My patch contains a workaround. The stream must be reset before passing it
to the Word2TextExtractor constructor. 

Please provide any additional information below.

Original issue reported on code.google.com by antoni.mylka@gmail.com on 25 Aug 2009 at 12:53

Attachments: