sul-dlss / dor-services-app

A Rails application exposing Digital Object Registry functions as a RESTful HTTP API
https://sul-dlss.github.io/dor-services-app/
Other
3 stars 2 forks source link

Shelve from preservation error: mismatch between expected file size and received file size #5138

Closed honeybadger[bot] closed 1 month ago

honeybadger[bot] commented 1 month ago

If I understand the file size check logic in app/services/shelvable_files_stager.rb:

This error says that there's a size mismatch for a druid where I tried to shelve from preservation link to druid

[dor-services-app/stage] RuntimeError: File copied from preservation was not the expected size. Expected 1039 bytes for zy609tw5585_0001.xml; received 2821 bytes.

But the received size for zy609tw5585_0001.xml is correct, that's the size of the file:

ls -l zy609tw5585_0001.xml 
-rw-r----- 1 pres pres 2821 Jul 24 17:22 zy609tw5585_0001.xml

That suggests the expected length of 1039 is not correct. No file in this item has a size of 1039 bytes. So where does the 1039 bytes come from? Here's my guess:

If you compress the file by streaming the output through gzip, you get 1039 bytes:

cat zy609tw5585_0001.xml | gzip > zy609tw5585_0001.gz
ls -l
total 8
-rw-r--r-- 1 pres pres 1039 Jul 24 17:33 zy609tw5585_0001.gz
-rw-r----- 1 pres pres 2821 Jul 24 17:22 zy609tw5585_0001.xml

If prescat gzips some or all files for transfer, then the content-length is probably be the gzipped size of those files, which won't match the size of the file after it's decompressed. It might be more reliable to take the expected size from Cocina or even use checksums (at the cost of time and processing power).

View full backtrace and more info at honeybadger.io