expected size = content-length from the HTTP header
received = actual bytes written to disk
This error says that there's a size mismatch for a druid where I tried to shelve from preservation link to druid
[dor-services-app/stage] RuntimeError: File copied from preservation was not the expected size. Expected 1039 bytes for zy609tw5585_0001.xml; received 2821 bytes.
But the received size for zy609tw5585_0001.xml is correct, that's the size of the file:
That suggests the expected length of 1039 is not correct. No file in this item has a size of 1039 bytes. So where does the 1039 bytes come from? Here's my guess:
If you compress the file by streaming the output through gzip, you get 1039 bytes:
If prescat gzips some or all files for transfer, then the content-length is probably be the gzipped size of those files, which won't match the size of the file after it's decompressed. It might be more reliable to take the expected size from Cocina or even use checksums (at the cost of time and processing power).
If I understand the file size check logic in app/services/shelvable_files_stager.rb:
content-length
from the HTTP headerThis error says that there's a size mismatch for a druid where I tried to shelve from preservation link to druid
But the received size for zy609tw5585_0001.xml is correct, that's the size of the file:
That suggests the expected length of 1039 is not correct. No file in this item has a size of 1039 bytes. So where does the 1039 bytes come from? Here's my guess:
If you compress the file by streaming the output through gzip, you get 1039 bytes:
If prescat gzips some or all files for transfer, then the
content-length
is probably be the gzipped size of those files, which won't match the size of the file after it's decompressed. It might be more reliable to take the expected size from Cocina or even use checksums (at the cost of time and processing power).View full backtrace and more info at honeybadger.io