imi-bigpicture / wsidicom

Python package for reading DICOM WSI file sets.
Apache License 2.0
32 stars 5 forks source link

Automatic selection of offset table type #76

Closed erikogabrielsson closed 9 months ago

erikogabrielsson commented 1 year ago

The DICOM file writer can either write basic or extended (or no) offset table. Basic offset table (BOT) can only index image files up to some 4 GB (due to the 32 bit unsigned integers used for indexing). Currently using BOT will fail after writing the file if the size is exceeded. An EOT should be used for larger files.

BOT and EOT are both places before the image data, but can only be written after all the image data has been written. Currently space is thus reserved for either BOT or EOT before writing the image data (and then filled in after writing the image data).

It would be nice if the writer would automatically select the best offset table to use based on the image data to write. When writing from DICOM this should be relative easy as there is no re-encoding. When the writer is used in wsidicomizer to convert non-DICOM files, the size to write can be harder to estimate as for example jpeg headers are added or the image data is re-encoded.

An simpler alternative could be to, if the file becomes to large for BOT, copy the unfinished file but use an EOT. As the image data would already be correctly prepared, this could be a simple and quick operation.

erikogabrielsson commented 9 months ago

Closed by #132