webrecorder / browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
https://webrecorder.net/browsertrix
GNU Affero General Public License v3.0
201 stars 35 forks source link

Add support for custom selectors for extracting links #2152

Open tw4l opened 1 week ago

tw4l commented 1 week ago

Related to https://github.com/webrecorder/browsertrix-crawler/issues/217

We now have the ability in the crawler to specify custom selectors for extracting links, but this has not yet been added to the workflow editor UI.

In sprint planning we also briefly discussed whether we could have a sort of catch-all way of adding crawler flags that don't have corresponding Browsertrix UI elements like this - possibly a text box where users could add arbitrary flags (this would require a good deal of thought and might have security implications).