internetarchive / Zeno

State-of-the-art web crawler 🔱
GNU Affero General Public License v3.0
81 stars 10 forks source link

Implement linkheader parsing #95

Closed HarshNarayanJha closed 3 months ago

HarshNarayanJha commented 3 months ago

This PR adds a Parse function for linkheader parsing, therefore removing the dependency on github.com/tomnomnom/linkheader. This also includes writing unit tests for the function

Closes #85

HarshNarayanJha commented 3 months ago

Run the tests using go test -v $(go list ./... | grep crawl), or run go test -v from the crawl directory. Don't know why, but go doesn't want to run all the tests from the project root.

HarshNarayanJha commented 3 months ago

Is the implementation alright? Should I go ahead and replace the usages in capture.go?

HarshNarayanJha commented 3 months ago

Replaced regexep with Trim and Split, and now it also tries to parse URL out of malformed inputs, otherwise just returns empty slice

HarshNarayanJha commented 3 months ago

Removed linkheader dependency and updated capture.go to use the new Parse function

HarshNarayanJha commented 3 months ago

Something's preventing the merge? Did I miss anything?