Maximum allowed dimension exceeded

ghost commented 5 years ago

Hey, This seems like a really useful project. I played around with it a little and ran into an issue of size... If I use my own images or your example:

labels = np.ones(shape=(2000, 2000,1000), dtype=np.uint32, order='F')
    dt = edt.edt(
    labels3, anisotropy=(6, 6, 30),
    black_border=True, order='F',
    parallel=36
    )

I get an error that the: Maximum allowed dimension exceeded I cannot find this error in your source code so I cannot really comment on the source of it, but is there a fundamental size limitation in your algorithm?

Cheers, Duncan

william-silversmith commented 5 years ago

Hi Duncan,

It looks like that is a fundamental numpy error where under the hood it was using a 32-bit integer to represent array length.

https://github.com/numpy/numpy/blob/8db1b077736e8418c60f2b802dab461be13f189d/numpy/core/src/multiarray/conversion_utils.c#L914

~/euclidean-distance-transform-3d/python/edt.pyx in edt.edt3dsq()
    397 
    398   cdef size_t voxels = sx * sy * sz
--> 399   cdef np.ndarray[float, ndim=1] output = np.zeros( (voxels,), dtype=np.float32 )
    400   cdef float[:] outputview = output
    401 

ValueError: Maximum allowed dimension exceeded

The reason I'm using numpy to initialize the array is because it's the easiest way to have numpy manage its memory. It becomes difficult to achieve that without making a copy after the fact (e.g. if I just use calloc). There are various blog entries out there that show how to force a numpy array to take ownership of a pointer or other memory buffer without making a copy, but they didn't work when I was experimenting (I probably did something wrong).

If making a copy is okay... you can try using a pre-1.3 vesion of the EDT, however, they don't have the parallel feature.

ghost commented 5 years ago

I think if you changed line 398, to cast each size to int_64t ot size_t then this would be resolved.

cdef size_t voxels = (size_t) sx * (size_t) sy * (size_t) sz

It's just that sx,sy and sz are defined as int in the code and therefore the multiplaction is calcuated for int, overflows and then is cast to size_t.

Just an FYI, there is a fantastic package called pybind11, this makes integrating numpy with C++ code very straightforward. I am not proposing you refactor your code, but I think it could make your life easier for similar projects. :)

william-silversmith commented 5 years ago

D'oh. I'd previously ruled that out because I ran into that problem in https://github.com/seung-lab/connected-components-3d and only cross checked the hpp file of the edt...

Thank you so much for your help! I was feeling really bummed about this.

william-silversmith commented 5 years ago

Also, I had been frustrated by the repetitive template code in the pyx. Thanks for the tip on pybind11!

ghost commented 5 years ago

Hey, Thanks for being so fast with the fix. I'll run the updated version tomorrow on these monster files, I can even share some benchmarks if you want?

The connected 3d components might also be very useful for this project. Sadly my research group does not (yet) open source any of it's code, otherwise I could show you some examples of using pybind11 to speed up image processing operations. It seems your group is very active in sharing the fruits of your toil, which is very admirable, I am quite envious.

william-silversmith commented 5 years ago

That would be super cool. :) I'd be happy to post benchmarks if you submit them.

Thanks! I'm in a weird position where my lab employs software developers. We try to be very open. I'm happy this code is helpful for you. I had a lot of fun making it!

seung-lab / euclidean-distance-transform-3d

Maximum allowed dimension exceeded #15