Open despens opened 2 years ago
Yes, it's an interesting idea, was thinking perhaps a 'video log' to accompany a crawl, that could be treated as a log in addition to a regular text log. Of course, for long running crawls would need to break this up into smaller chunks, eg. a crawl could be running for several days!
TheA video of a full crawl could grow to a massive size indeed. But for starters relying on the user to limit the crawl accordingly to their available capacity could be enough, especially if it is conceptualized as a debugging tool. This could be a m3u list with subtitled videos in the end. :)
The screencast option is very useful to observe how websites might cause the crawler to hang, for instance because of cookie banners, captchas, etc.
It would be great if there was a mode that instead of capturing a web archive would capture a video of a single worker crawling. This could be used to check if any issues would have to be expected during crawling. As the crawling browser doesn't feature a full user interface that displays the current URL, a plaintext subtitles file (in srt format or similar) could be generated for the URL to appear in the video.
Obviously it would be best limited to a small amount of pages or overall crawl time.