Open ikreymer opened 1 year ago
As a user, I don't mind doing the recording in parts, for example, with 10,000 URLs at a time. In this case, for 80,000 seeds I would only have to make 8 crawls, which I would add to a "collection" (which is one of the interesting features you've created). Perfect.
But the feature I miss the most is the one you mention above "2) User should be able to validate a large list of URLs, and receive good error messages on which URLs in a large list are invalid."
If I know precisely which URLs are considered invalid, I can correct them manually or delete them from the list. But in a list with thousands of lines it's very difficult for me, as an ordinary user, to find the error.
About the list limit: I made a recording of 4000 URLs (auto-scroll, block ads) and got a WACZ with 44 GB. If I did the same with 10,000 I'd get about 100 GB. If I followed "Any link on the page" I'd get a bigger WACZ. As a user I find it useful to have a limit.
Will track changes/tasks related to URL list handling here:
Context
The URL list crawl type well for a small number, tens, hundreds URLs, but there may be potential issues when entering thousands of URLs, including:
User Story Sentence
As a user, I'd like to crawl 30,000 URLs, or 100,000 URLs in a single page crawl.
Requirements
Questions
Tasks