Open indrajithi opened 2 weeks ago
Wouldn't that be set by the Spider.max_links value?
@lodenrogue
Max max_links is basically the max hops the crawler will make. Let us say we start from github.com
as the root url. In the first crawl we will fetch all the links in github.com and then recursively crawl all the links we fetched until max_link
count is reached.
Eg: Say we found three links from the root url: [URL1, URL2, URL3]
If the max link is set as 2. We will only crawl [URL1, URL2]
and fetch the links in that.
This feature we are expecting the crawler to fetch the urls provided by the user and nothing more. The list of urls to crawl will be a custom set provided by the user as input. There will be no root url base crawls and hops.
For example, url_list = [URL1, URL2, URL3], so we will loop through this url_list and fetch the link, but there will be no root url. If I am getting it right, I would love to solve it and ask for the assignment of this issue to me.
Hi @C0DE-SLAYER.
Please let us know if you are working on this?
@indrajithi yes I will open a PR by today
url_list