Open YousufSSyed opened 1 year ago
Thanks @YousufSSyed - looks like you hit a bug here, and agreed that pywb should handle a missing WARC file more gracefully. In the case you describe above, are you also deleting corresponding CDX/CDXJ index entries for the deleted WARC, or just the WARC itself? The latter might explain why pywb expects there to be an archive file to load from.
No I haven’t been deleting index files, however I’d use the wb-manager reindex
command when I’d face the errors.
Is there any testing or reproducing you’d like me to do?
I'm going to reproduce on my end tomorrow and will let you know!
I'm not too familiar with how Pywb works, but I want it so that if a WARC were to be deleted (either unintentionally or on accident), Pywb recognizes that its not no longer there and doesn't try to show it. When I record a page, delete the WARC for it, run
wb-manager reindex {collection}
, and then go to.../{collection}/{url}
, I get this:{'args': {'coll': 'archive', 'type': 'replay', 'metadata': {}}, 'error': '{"message": "rec-20230401040109730649-file.warc.gz: [Errno 2] No such file or directory: \'/Users/yousuf/.pyenv/versions/3.9.16/lib/python3.9/collections/archive/archive/rec-20230401040109730649-file.warc.gz\'",
Environment