nel-lab / mesmerize-core

High level pandas-based API for batch analysis of Calcium Imaging data using CaImAn
Other
58 stars 15 forks source link

int32 not sufficient for nbytes #279

Closed nspiller closed 6 months ago

nspiller commented 6 months ago

dtype of self.nbytes defaults to int32, which is not sufficient for larger datasets.

E.g.

>> shape = (26191, 512, 512)
>> itemsize = 8
>> np.prod(shape + (itemsize,), dtype=np.int32)
-908066816

>> np.prod(shape + (itemsize,), dtype=np.int64)
54926508032
kushalkolar commented 6 months ago

numpy usually defaults to int64 for int types if it's not specified, do you have an example of where you encountered this?

In [9]: shape = (1_000_000, 1_000_000, 1_000_000)

In [10]: np.prod(shape + (itemsize,))
Out[10]: 8000000000000000000

In [11]: np.prod(shape + (itemsize,)).dtype
Out[11]: dtype('int64')
nspiller commented 6 months ago

I did some more testing and it is a numpy + OS issue.

On Linux with numpy v1.26.2

>> type(np.prod((1_000_000, 1_000_000)))
numpy.int64

On Windows with numpy v1.26.0

>> type(np.prod((1_000_000, 1_000_000)))
numpy.int32

TIL that the default int dtype is platform dependent, which would explains this behavior. So, I guess in order to keep it Windows-compatible, dtype=np.int64 needs to be explicitly defined upon array creation.

kushalkolar commented 6 months ago

:man_facepalming: It's always windows

Thanks for the contrib! :)