sjdirect / abot

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Apache License 2.0
2.25k stars 560 forks source link

Improved base tag handling #232

Closed thedeedawg closed 3 years ago

thedeedawg commented 3 years ago

Takes care of issue #230 and #231.

This development aims to improve upon the current handling of base tags within the HyperLinkParser. More specifically, this introduces support for relative base tags in addition to fixing a bug relating to root-relative values when running on Linux. For more information, check the linked issues above.

The testing of this functionality have also been expanded upon to cover more ground. To that end, all of the base tag tests have been consolidated into one data driven test with multiple outcomes, as the tests are otherwise completely the same. All current test cases are still present, but the input for the invalid base tag case had to be adjusted. The old value of http:http://http: technically is a valid relative value according to the defined standard and as such the new logic would succeed in creating a Uri instance from it since relative values are now supported. However, the overall goal of the test case (ensuring that invalid values won't break the parsing) remains intact.

sjdirect commented 3 years ago

Would you be willing to refactor your changes to the Test class please? I understand semantically what you are trying to achieve with some of your changes but the diff makes it hard to quickly understand the impact of your changes to the tests. It's more important to have consistency in the code base and the keep the pr be as MINIMAL as possible. Also please do NOT delete/alter any tests that aren't necessary as this removes confidence that the change isn't causing issues with original set of tests.

sjdirect commented 3 years ago

PR accepted, appreciate the contribution