Closed RomeSilvanus closed 11 months ago
Judging by this line...
2023-12-12 21:01:02,173: [INFO]: Copied /data/Furry/sinnerdragon.keenspace.com/warc/sinnerdragon.keenspace.com-2021-06-13-7bc59e6e.cdx to /webarchive/collections/archive/archive
...it looks like you're adding .cdx
files to the collection, and pywb is trying to interpret them as WARC files. You should only add the .warc.gz
files as pywb generates a new .cdxj
file in a different folder.
Alright, I really didn't see this. My fault and I made some changes.
Describe the bug
Apologies if this is not an issue with pywb at all. I have a lot of warc.gz files made with grab-site, when trying to import them the following errors pop up:
I did a search in the Issues but nothing came up.
(I also want to add this here without making a new issue that the docker image should have an ENV option to enable recording)
Steps to reproduce the bug
Make a warc with grab-site.
Expected behavior
A better error messages telling me if this is actually bad and can lead to problems, or if it is just a warning. Also the process of adding large files takes hours, so a progress indicator would be nice.
Environment
Additional context