Closed nickgeorgiou closed 4 years ago
I actually have a requirement where there are multiple entry points for crawling.
e.g site that contains micro-sites or orphaned content, not always accessible from a single entry point. Can we also factor that into this effort?
Description Crawler currently always begins crawling from the root of the
domain
specified in thedomain
configuration property. Sometimes it is useful to begin crawling a site from a sub-page/path. The crawler would start with that page so that pages linked from there would appear at the top of the list of URLsProposed solution Provide a configuration property e.g.
starting_path
that allows someone to specify a path from which to begin crawling, rather than always starting to crawl from the/
root page.