sjdirect / abot

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Apache License 2.0
2.25k stars 560 forks source link

Meta Tag Crawling #215

Closed shaun-hutch closed 4 years ago

shaun-hutch commented 4 years ago

This feature adds functionality allowing the crawler to process links obtained from inside an HTML <meta> tag. The URL is then added to the list of links which can then be further processed.

I have created a new property in Abot2, CrawlConfiguration.cs, named FollowMetaRedirects defaulting to false.

shaun-hutch commented 4 years ago

Hi Steven,

Thank you for your feedback on my pull request and I will make the appropriate changes and additions. My current pull request is actually about to change as I needed to do a refactor on where the meta URL is actually obtained. My plan then will be to close this pull request and only open a new one once I have put together my changes and have written unit tests.

Regards,

Shaun