jaeksoft / opensearchserver

Open-source Enterprise Grade Search Engine Software
http://www.opensearchserver.com
Apache License 2.0
499 stars 191 forks source link

Indexation buffer is ignored #1463

Open davidebaldini opened 9 years ago

davidebaldini commented 9 years ago

OSS version 1.5.11

From the configuration shown in screenshot, I've set my indexation buffer to 500 and started the recursive crawling of 4 URLs, listed under "Pattern list" with a root wildcard, as in http://example.org/*.

The websites are large, and the crawler takes weeks to complete, so it is fundamental to periodically commit the buffer into the index. However, no indexation seems to be ever performed, as the Committed columns (shown on screenshot) list 0, and the number of docs listed in the index table – under the "Index" tab – remains unchanged.

Am I doing something wrong?

crawler

AlexandreToyer commented 9 years ago

Hi Davide,

You should use lower values for parameters Number of URLs to crawl and Maximum number of URLs per host, because those parameters are for one session of crawl only, not for the whole crawling. This way you will regularly see content indexed.

Have a look at this page to fully understand the process: http://www.opensearchserver.com/documentation/faq/crawling/how_to_configure_crawl_process_for_web_crawler.md

I would also advise you to upgrade to last build for version 1.5.12 since we fixed some issues related to the web crawler: http://www.open-search-server.com/ftp/OpenSearchServer_1.5/build-1.5-b940/

Best regards, Alexandre