Closed anastasia closed 7 years ago
It turns out these warcs are all truncated... something happened when they were being saved to perma-mirror. Either the saving process hung, mid-write, or celery inappropriately terminated the capture process due to timeout. Investigating further......
Manually dove into the below using rdb:
def close_warc_after_writing(self, out):
out.flush()
out.seek(0)
default_storage.store_file(out, self.warc_storage_file(), overwrite=True)
out.close()
If you contrive a partial write, inside default_storage.store_file, you can arrange for a truncated warc that looks just like our LOCKSS-reported failures.
Hopefully this will happen far less frequently with the tweaked capture code. Closing for now; will reopen if this behavior resurfaces.
(We're now allowing 2 minutes for recovery after soft time out, and found at least one reason SoftTimeOut was getting ignored, before. https://github.com/harvard-lil/perma/commit/6ac4ba742d2bb2b0f7f9502841353f8c10967587)
We should consider having different fields for different types of failures. For instance, I think perma.cc/XR5M-BWGY should be marked as
primary_capture.status == "success"
because it's possible to replay this warc. However, possibly because of a failed s3 transfer, we're marking primary capture status as failed.