luispedro / mahotas

Computer Vision in Python
https://mahotas.rtfd.io
Other
839 stars 147 forks source link

haralick returns empty array #67

Closed damonmaria closed 8 years ago

damonmaria commented 8 years ago

On one particular input I'm getting an empty array from haralick. It has been working fine millions of times so I know it's this particular input.

>>> a = np.array([[  0,   0,   0,   0,   0,   0,   0,   0, 131,   0, 139,   0,   0,   0,   0,   0,   0,   0,
             0,   0,   0,   0,   0,   0,   0,   0],
          [  0,   0,   0,   0,   0, 136,   0,   0,   0,   0,   0,   0,   0, 131,   0,   0,   0, 136,
             0,   0, 139,   0,   0,   0,   0,   0],
          [  0,   0,   0,   0,   0,   0,   0, 137,   0, 139,   0,   0,   0,   0,   0, 129,   0,   0,
             0,   0,   0,   0,   0,   0,   0,   0]])
>>> mh.features.haralick(a, ignore_zeros=True).shape
(0,)

And it's also to do with ignore_zeros

>>> mh.features.haralick(a, ignore_zeros=False).shape
(4, 13)

Are the haralick features not able to be calculated for this input? If so the docs should mention that such a result is possible.

luispedro commented 8 years ago

Mmmm, it is strange behaviour. I am not sure what the result should be in this case, though.

(If you ignore_zeros, then there are the adjacency matrices are all zeros). What kind of output would you expect? An error? NaNs? Zeros?

damonmaria commented 8 years ago

I don't think any value in a normally sized array result makes sense. My suggestion would be an exception because then at least it can be explained the reason for it. Overall I think the most important thing is to document what can happen in the parameter description of ignore_zeros so that the caller hopefully knows to deal with it.

hackermd commented 8 years ago

I'm ran into a related problem: mh.features.haralick(a, ignore_zeros=True, return_mean=True) returns nan instead of a numpy array.

The problem is that the array of feature values in line 321 in mh.features.texture.haralick_features() is empty, which results in nan when one tries to calculate the mean for the empty array:

import numpy as np
a = np.array([])
a.mean(axis=0)

In my opinion, returning an array of NaNs would be the cleanest solution, since we would at least get the same data types (numpy.array with float values) and consistent dimensions ((4, 13)).

A warning in the docs would be nice and would alert the user that this may happen in rare cases when setting ignore_zeros to True.

luispedro commented 8 years ago

Thanks guys. Commit https://github.com/luispedro/mahotas/commit/fd097559a7e35fd8198341ae53e898bdf0ed897c improved the docs too.