primary_capture marked as 'failed' when capture succeeded but s3 transfer failed

harvard-lil / perma

Indelible links

423 stars 71 forks source link

primary_capture marked as 'failed' when capture succeeded but s3 transfer failed #1991

Closed anastasia closed 7 years ago

anastasia commented 7 years ago

We should consider having different fields for different types of failures. For instance, I think perma.cc/XR5M-BWGY should be marked as primary_capture.status == "success" because it's possible to replay this warc. However, possibly because of a failed s3 transfer, we're marking primary capture status as failed.

rebeccacremona commented 7 years ago

It turns out these warcs are all truncated... something happened when they were being saved to perma-mirror. Either the saving process hung, mid-write, or celery inappropriately terminated the capture process due to timeout. Investigating further......

rebeccacremona commented 7 years ago

Manually dove into the below using rdb:

   def close_warc_after_writing(self, out):
        out.flush()
        out.seek(0)
        default_storage.store_file(out, self.warc_storage_file(), overwrite=True)
        out.close()

If you contrive a partial write, inside default_storage.store_file, you can arrange for a truncated warc that looks just like our LOCKSS-reported failures.

rebeccacremona commented 7 years ago

Hopefully this will happen far less frequently with the tweaked capture code. Closing for now; will reopen if this behavior resurfaces.

rebeccacremona commented 7 years ago

(We're now allowing 2 minutes for recovery after soft time out, and found at least one reason SoftTimeOut was getting ignored, before. https://github.com/harvard-lil/perma/commit/6ac4ba742d2bb2b0f7f9502841353f8c10967587)