keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.86k stars 19.44k forks source link

HDF5 Normalizer not working. #8304

Closed prakashjayy closed 3 years ago

prakashjayy commented 6 years ago
def preprocess_train(array):
    """ Given a batch of numpy arrays, it outputs a batch of numpy of arrays with all preprocessing

    size : (w, h)
    """
    num1 = np.random.randint(0, 128 - 112)
    num2 = np.random.randint(0, 171 - 112)
    crop = array[ :, num1:num1+112, num2:num2+112, :]
    crop = crop/255.0
    return  crop
X_train = HDF5Matrix(train_loc, 'images', start=0, normalizer=preprocess_train)
y_train = HDF5Matrix(train_loc, 'labels')
model_final.fit(X_train, y_train, batch_size=16, shuffle='batch', validation_data = [X_test, y_test], epochs=10)
ValueError: Error when checking model input: expected conv1_input to have shape (None, 16, 112, 112, 3) but got array with shape (5797, 16, 128, 171, 3)

Basically I have a h5py file with shape (5797, 16, 128, 171, 3) and my preprocess function should output (16, 112, 112, 3). this is not happening.

However when I run only X_train and used Xtrain.getitem(1). It outputs an array with (16, 112, 112, 3) shape.

Not sure where I am going wrong. Can someone help me ?

kmader commented 6 years ago

The HDF5Matrix calculates the shape based on the shape of the input data https://github.com/keras-team/keras/blob/aab55e649c34f8a24f00ee63922d049d3417c979/keras/utils/io_utils.py#L97-L113 which means having a normalizer function that changes the dimensions (and type) will not work since the model.fit command uses the predefined type

kmader commented 6 years ago

as a temporary hack-fix you can use NormalizedHDF5Matrix in place of HDF5Matrix and it should work

# we dont need full resolution images so we can just use a downsampled version
class NormalizedHDF5Matrix(HDF5Matrix):
    def __init__(self, datapath, dataset, start=0, end=None, normalizer=None):
        ds_norm = lambda x: x if normalizer is None else normalizer
        super(NormalizedHDF5Matrix, self).__init__(datapath, dataset, start=start, end=end, normalizer=ds_norm)
        t_val = self[0:1]
        self._base_shape = t_val.shape[1:]
        self._base_dtype = t_val.dtype

    @property
    def shape(self):
        """Gets a numpy-style shape tuple giving the dataset dimensions.
        # Returns
            A numpy-style shape tuple.
        """
        return (self.end - self.start,) + self._base_shape

    @property
    def dtype(self):
        """Gets the datatype of the dataset.
        # Returns
            A numpy dtype string.
        """
        return self._base_dtype