webrecorder / warcit

Convert Directories, Files and ZIP Files to Web Archives (WARC)
https://pypi.python.org/pypi/warcit
Apache License 2.0
81 stars 13 forks source link

The pages list in ReplayWeb.page doesn't populate with pages from WARC files created in warcit #32

Open Shrinks99 opened 1 year ago

Shrinks99 commented 1 year ago

Context

This is a problem that has been encountered in a few forum posts (and also in my own projects!) at this point and deserves a real issue :)

What did you expect to happen? What happened instead?

WARC files created with warcit display valid URLs in the URLs list however they aren't listed as pages.

Step-by-step reproduction instructions

  1. Create a WARC with warcit
  2. Load the resulting WARC file into ReplayWeb.page
  3. Note the URL list (full of URLs) and the empty pages list