jaeksoft / opensearchserver

Open-source Enterprise Grade Search Engine Software
http://www.opensearchserver.com
Apache License 2.0
499 stars 191 forks source link

Patterns added using REST api are not being crawled. Patterns added manually however are crawled. #1903

Open JimHha opened 6 years ago

JimHha commented 6 years ago

When adding patterns with a wildcard on the end to OSS using either curl or the python requests library the web crawler does not fetch them. I have retrieved success responses each time and verified their presence afterwards.

Occasionally if I request a manual crawl of a pattern added using the REST api that OSS isn't crawling, the manual crawl may trigger the crawler.

If I wait long enough for it to re-fetch, it still doesn't crawl the urls added using the REST api.

If I add patterns using the REST api, see that they aren't being crawled, then add a new url manually the new url that was added manually will be crawled while the ones added previously using the REST api still aren't crawled.

If however I use the interface to manually add them and start the crawler they are fetched, parsed and indexed just fine.

I have been using the .deb package for installation and testing on both Debian 8.6 and a Debian 9 with no success.