jaeksoft / opensearchserver

Open-source Enterprise Grade Search Engine Software
http://www.opensearchserver.com
Apache License 2.0
499 stars 190 forks source link

Parse error 1 only when crawling a directory #1803

Open Bouhmarc opened 8 years ago

Bouhmarc commented 8 years ago

Hi

I tried OpenSearchServer this weekend, and first of all i'd like to thank all the team, it's a really great job !! but i faces some little problems.. The first one is that my files are not parsed when i use the crawler job i defined in the website. In the log i get an error (stack below) but this error does not appears when i use the REST Api to parse the same document. How do i get more infos on what throw an error 1 when parsing a file.

It happens indifferently with a PDF file and a docx file. (the both of them works when parsed with the API) I added the docx file in attachments. Thank you for your answers...

Here is the stack :

08:03:39,554 WARN: root - Error while working on URL: file:/tmp/equipements-location-checkliste.docx : Process exited with an error: 1 (Exit value: 1) 08:03:39,554 WARN: root - Error while working on URL: file:/tmp/equipements-location-checkliste.docx : Process exited with an error: 1 (Exit value: 1) org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404) at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166) at com.jaeksoft.searchlib.util.ExecuteUtils.command(ExecuteUtils.java:86) at com.jaeksoft.searchlib.parser.ExternalParser.doParserContent(ExternalParser.java:198) at com.jaeksoft.searchlib.parser.Parser.doParserContentExternal(Parser.java:142) at com.jaeksoft.searchlib.parser.ParserSelector.parserLoop(ParserSelector.java:499) at com.jaeksoft.searchlib.parser.ParserSelector.parseFileInstance(ParserSelector.java:579) at com.jaeksoft.searchlib.crawler.file.spider.CrawlFile.download(CrawlFile.java:88) at com.jaeksoft.searchlib.crawler.file.process.CrawlFileThread.crawl(CrawlFileThread.java:152) at com.jaeksoft.searchlib.crawler.file.process.CrawlFileThread.browse(CrawlFileThread.java:117) at com.jaeksoft.searchlib.crawler.file.process.CrawlFileThread.browse(CrawlFileThread.java:108) at com.jaeksoft.searchlib.crawler.file.process.CrawlFileThread.runner(CrawlFileThread.java:133) at com.jaeksoft.searchlib.process.ThreadAbstract.run(ThreadAbstract.java:300) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724)

equipements-location-checkliste.docx

Bouhmarc commented 8 years ago

Hi everybody,

I found my problem... I checked the box "Parsing uses external process".

So whether there is something i didn't understand, or this option don't work in my case.. What i understood : This options starts a new process for every parse job, so if it crashes, it doesn't affect the main program. Am i right on this ?

regards Marc