zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.45k stars 273 forks source link

Boolean array uses same amount of space as uint8 #2066

Closed haarisr closed 1 month ago

haarisr commented 1 month ago

Zarr version

2.18.2

Numcodecs version

0.13.0

Python Version

3.12.4

Operating System

linux

Installation

pip

Description

Saving a boolean array to with zarr uses the same amount of space as a uint8

Type               : zarr.core.Array
Data type          : bool
Shape              : (400000, 128, 128)
Chunk shape        : (1, 128, 128)
Order              : C
Read-only          : False
Compressor         : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type         : zarr.storage.DirectoryStore
No. bytes          : 6553600000 (6.1G)
No. bytes stored   : 370
Storage ratio      : 17712432.4
Chunks initialized : 0/400000

Type               : zarr.core.Array
Data type          : uint8
Shape              : (400000, 128, 128)
Chunk shape        : (1, 128, 128)
Order              : C
Read-only          : False
Compressor         : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type         : zarr.storage.DirectoryStore
No. bytes          : 6553600000 (6.1G)
No. bytes stored   : 366
Storage ratio      : 17906010.9
Chunks initialized : 0/400000

Steps to reproduce

import numpy as np
import zarr

arr = zarr.open_array("./image", shape=(400000,128,128), chunks=(1,128,128), dtype=np.bool_,mode="w")
print(arr.info)

arr = zarr.open_array("./image", shape=(400000,128,128), chunks=(1,128,128), dtype=np.uint8,mode="w")
print(arr.info)

Additional output

No response

d-v-b commented 1 month ago

this is expected, as the boolean data type is 1 byte.

haarisr commented 1 month ago

Oh that's interesting thanks

Will close the issue