idiap / attention-sampling

This Python package enables the training and inference of deep learning models for very large data, such as megapixel images, using attention-sampling
Other
97 stars 18 forks source link

expected_with_replacement #20

Closed AndrewTal closed 2 years ago

AndrewTal commented 2 years ago

Hi,

I recently read this work, it's really a good idea and work! but I have a question, if we change function _expected_with_replacement:

@K.tf.custom_gradient
def _expected_with_replacement(weights, attention, features):
    """Approximate the expectation as if the samples were i.i.d. from the
    attention distribtution.
    The gradient is simply scaled wrt to the sampled attention probablity to
    account for samples that are unlikely to be chosen.
    """
    # Compute the expectation
    wf = expand_many(weights, [-1] * (K.ndim(features) - 2))
    F = K.sum(wf * features, axis=1)

    # Compute the gradient
    def gradient(grad):
        grad = K.expand_dims(grad, 1)

        # Gradient wrt to the attention
        ga = grad * features
        ga = K.sum(ga, axis=list(range(2, K.ndim(ga))))
        ga = ga * weights / attention

        # Gradient wrt to the features
        gf = wf * grad

        return [None, ga, gf]

    return F, gradient

to:

def _expected_with_replacement(weights, attention, features):
    wf = expand_many(weights, [-1] * (K.ndim(features) - 2))
    F = K.sum(wf * features, axis=1)

that means, we don not use the back-propagation method mentioned in the paper. In this case, end-to-end training can also be done.

Will the experimental results become worse in this condition?

Thanks!