Closed JoshTango closed 3 years ago
Have you tried creating your own IHyperLinkParser or Extending the AnglesharpHyperlinkParser to implement this logic. Wouldn't be hard to do. You would also need to change the following to make sure it would download the content of the sitemap url...
config.DownloadableContentTypes = "text/html, application/xml";
I might one day. but the sitemap.xml is suck a generalized standard thing these days I thought you might want to build it in to Abot
Abot doesn't use sitemaps to help discover pages to crawl?
It's default behavior is to crawl the site based on real navigate-able links. The sitemap can be completely out of sync with the real site so was never part of the original design. However, you can implement your own IHyperLinkParser like mentioned above that will use the sitemap.
In my experience, we have used sitemaps extensively to help search engines index pages of our sites that they may otherwise have trouble finding. So yeah we'll have to implement this internally I guess.
If I feed in a sitemap.xml link into abot the parsedlinks is null. Now alot of websites with sitemap.xml look like this: