Closed nvanderperren closed 3 years ago
This should be expected for pages, since there is no page detection (which was always a bit experimental, since other tools don't have a concept of pages, only URLs). The page detection was a work-around since WR Player did not have a way to load URLs. However, the URL search in replayweb.page should allow searching by HTML as well as all other types of resources.
The plan is that the 'page detection' will be part of the WACZ format, and the detection can happen optionally. Are you able to search by URLs from the URL tab or is that also blank? If it is blank, that is likely something wrong. Would you be able to share the WARC file Would you be able to share the WARC file in question?
It's also blank if I search by URL's. I can share the WARC file if you let me know how I can send it to you.
Further investigation 🕵️♀️
I don't have this problem with WARCs created with Browertrix, Webrecorder Desktop, SquidWarc and Brozzler.
Tested the WARC shared from #23, the URLs are now showing up (no pages in this WARC), and I think related issues to indexing have been fixed.
Hi,
Some days ago I created a WARC file with Heritrix. Webrecorder Players discovers around 10.000 pages; replayweb 0. There certainly are pages and URL's in that WARC-file. Is this a bug? Or maybe there is a dependency for the app that I had to install first?