vivekthevish / sitemap

0 stars 0 forks source link

May not be generating complete sitemap #1

Open hrishin opened 5 years ago

hrishin commented 5 years ago

This program gets the first level of links in the site. Then it iterates over the first level and gets the second level of links. What about the remaining links of second level links, does it visits over those third level links? Does this program generate the complete sitemap?

For example, Let's assume foobar.com has the following site map

Where each letter represents a link with foobar.com. Does this program also get F, G links in the final sitemap?

vivekthevish commented 5 years ago

I believe it may not be generating the complete sitemap. I read a concept of 'depth' for this but was not clear with it so didn't applied that in my code.

The code is crawling all the links in main page and going 1 level deep in all the links to crawl.

Additionally there was chances of infinite crawling as I was not able to remove the redundancy of links.

Let me know your view on this.