webrecorder / browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
https://webrecorder.net/browsertrix
GNU Affero General Public License v3.0
201 stars 35 forks source link

[Feature]: Add support for custom crawl headers #2108

Open tw4l opened 1 month ago

tw4l commented 1 month ago

What change would you like to see?

Requested on IIPC Slack:

"We need the option to set a request header name and value in the configuration. It could be e.g. cookie: Zxcv1234 and it should then only be set in the request headers if domain name or URL is specified else set on all URLs request. Header Name = cookie Header Value = Zxcv1234"

Context

From IIPC Slack: "What we are missing is the option to manually add multiple custom headers and cookies to a crawl configuration, it could be the same place as user-agent is now. Some hosts offer that as the only access to their content behind a paywall."

This would require adding support for custom headers to the crawler as well as the Browsertrix workflow editor.

ikreymer commented 1 month ago

Hm, most of this can be solved with browser profiles, which offer a more user-friendly interface than tracking custom headers, especially for cookies. I suppose if there's a reason to support custom headers beyond user agent, it's definitely doable, but wonder how broadly useful this would be.