vshallc / PtrNets

Pointer Networks
102 stars 30 forks source link

Dropout #2

Open ppotash opened 8 years ago

ppotash commented 8 years ago

Hi,

Great code and thanks for sharing :). How do you run the model with dropout? It doesn't seem to actually be implemented in the training process.

-Peter

vshallc commented 8 years ago

Hi Peter, Unfortunately, I did not implement dropout in actual (although there is a dropout_layer method I forgot to delete...)

If you'd like to add a dropout layer(s), probably adding dropout layers: between input and encoder layers: _lstm function (line 271) between decoder and output layers: _ptr_probs (line 287)

the similar way as in this code

cheers xiaoxi

ppotash commented 8 years ago

Hi Xiaoxi,

Thanks for the response. If I put it in the lstm function, I need some switch to say whether I'm in training or testing mode, right? And this would also need to be an extra parameter to the function, I imagine.

-Peter

ppotash commented 8 years ago

Ok, I see how you do it from the code you linked to. Very 'theanic' :).

vshallc commented 8 years ago

Yes, very theanic ;)

and maybe here is a better solution: theano_toolkit

-- Edit -- Sorry, wrong link neural-turing-machines)

A smarter way using closure to build layers

ppotash commented 8 years ago

I'm still confused about using dropout in this implementation. Does trng need to be passed to the lstm function directly as an extra parameter? I tried that and got this error:

theano.tensor.var.AsTensorError: ('Cannot convert <theano.sandbox.rng_mrg.MRG_RandomStreams object at 0x7f1c87425890> to TensorType', <class 'theano.sandbox.rng_mrg.MRG_RandomStreams'>)

I also tried leaving it as a global variable in the ptr_network function, and that didn't work (and similarly tried initializing it in build_model and passing it as a parameter to the ptr_network function).

-Peter

vshallc commented 8 years ago

I guess you use an integer as the flag for training/testing, it should be a float, e.g. tensor.switch(is_training, ...) the is_training flag should be 1.0/0.0 instead of 1/0

ppotash commented 8 years ago

I'm talking about the RandomStreams instance used for the binomial function.

-Peter On Sep 20, 2016 1:13 AM, "Xiaoxi Wang" notifications@github.com wrote:

I guess you use an integer as the flag for training/testing, it should be a float, e.g. tensor.switch(is_training, ...) the is_training flag should be 1.0/0.0 instead of 1/0

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vshallc/PtrNets/issues/2#issuecomment-248203292, or mute the thread https://github.com/notifications/unsubscribe-auth/AD8F-tY8eOnz--i5e-M0IJO9evJOkwXVks5qr2tkgaJpZM4J9YNW .

ppotash commented 8 years ago

Fyi, I posted about this on stackoverflow: http://stackoverflow.com/questions/39606372/dropout-in-scan-theano

On Tue, Sep 20, 2016 at 1:24 AM, Peter Potash pjpotash@gmail.com wrote:

I'm talking about the RandomStreams instance used for the binomial function.

-Peter On Sep 20, 2016 1:13 AM, "Xiaoxi Wang" notifications@github.com wrote:

I guess you use an integer as the flag for training/testing, it should be a float, e.g. tensor.switch(is_training, ...) the is_training flag should be 1.0/0.0 instead of 1/0

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vshallc/PtrNets/issues/2#issuecomment-248203292, or mute the thread https://github.com/notifications/unsubscribe-auth/AD8F-tY8eOnz--i5e-M0IJO9evJOkwXVks5qr2tkgaJpZM4J9YNW .

vshallc commented 8 years ago

I 've tried to change dropout_layer into:

def dropout_layer(state_before, use_noise, trng, shape):
    proj = tensor.switch(use_noise,
                         (state_before *
                          #trng.binomial(state_before.shape,
                          trng.binomial(shape,
                                        p=0.5, n=1,
                                        dtype=state_before.dtype)),
                         state_before * 0.5)
    return proj

and call it as: h = dropout_layer(h, 1.0, trng, (options['batch_size'], options['dim_proj']))

and comment these line in get_minibatches_idx function to match the shape

def get_minibatches_idx(n, minibatch_size, shuffle=False):
    """ 
    Used to shuffle the dataset at each iteration.
    """

    idx_list = numpy.arange(n, dtype="int32")

    if shuffle:
        numpy.random.shuffle(idx_list)

    minibatches = []
    minibatch_start = 0 
    for i in range(n // minibatch_size):
        minibatches.append(idx_list[minibatch_start:minibatch_start + minibatch_size])
        minibatch_start += minibatch_size

    ''' remove these lines
    if minibatch_start != n:
        # Make a minibatch out of what is left
        minibatches.append(idx_list[minibatch_start:])
    '''
    return zip(range(len(minibatches)), minibatches)

It works in the training stage. But for evaluation stage, it failed because the shape will be (options['beam_width'], options['dim_proj']). So I guess a temporary solution would be building another f_encode method for this shape for beam search...