yajiemiao / pdnn

PDNN: A Python Toolkit for Deep Learning. http://www.cs.cmu.edu/~ymiao/pdnntk.html
Apache License 2.0
224 stars 105 forks source link

Increasingly negative loss in denoising autoencoder #49

Open pumakim opened 7 years ago

pumakim commented 7 years ago

Hi, I am training 3 layers stacked denoising autoencoder which has a bit of difference in loss function.

I want to make autoencoder that tries to reconstruct the 'global' input (not previous layer's output), which means the original input that was fed to the first layer, using normally obtained input which is previous layer's output.

I just edited all parameter 'self.x' to 'self.x_global' on def 'get_cost_updates' in layers/da.py. self.x_global means the original input that was fed to the first layer (self.x in models/sda.py).

And the result of training was like this.

[2016-12-04 01:36:48.845309] > ... training the model [2016-12-04 01:37:01.856494] > layer 0, epoch 0, reconstruction cost 405.427734 [2016-12-04 01:37:15.579682] > layer 0, epoch 1, reconstruction cost 381.404175 [2016-12-04 01:37:29.242537] > layer 0, epoch 2, reconstruction cost 377.724701 [2016-12-04 01:37:43.045209] > layer 0, epoch 3, reconstruction cost 375.875977 [2016-12-04 01:37:56.615403] > layer 0, epoch 4, reconstruction cost 374.741211 [2016-12-04 01:38:11.105572] > layer 1, epoch 0, reconstruction cost -108174.476562 [2016-12-04 01:38:24.891239] > layer 1, epoch 1, reconstruction cost -334065.656250 [2016-12-04 01:38:38.807076] > layer 1, epoch 2, reconstruction cost -561826.187500 [2016-12-04 01:38:52.979225] > layer 1, epoch 3, reconstruction cost -790545.687500 [2016-12-04 01:39:07.143726] > layer 1, epoch 4, reconstruction cost -1019794.250000 [2016-12-04 01:39:21.975468] > layer 2, epoch 0, reconstruction cost -152930.156250 [2016-12-04 01:39:36.551489] > layer 2, epoch 1, reconstruction cost -460353.750000 [2016-12-04 01:39:51.328428] > layer 2, epoch 2, reconstruction cost -767839.625000 [2016-12-04 01:40:05.910295] > layer 2, epoch 3, reconstruction cost -1075358.750000 [2016-12-04 01:40:20.484577] > layer 2, epoch 4, reconstruction cost -1382889.500000

Reconstruction cost goes increasingly negative. Is it normal? What it means??

Here is my edited code (just self.x to self.x_global in original code).

self.x_global is self.x in models/sda.py (original input) ################################################### def get_last_cost_updates(self, corruption_level, learning_rate, momentum): """ This function computes the cost and the updates for one trainng step of the dA """

    tilde_x = self.get_corrupted_input(self.x, corruption_level)
    y = self.get_hidden_values(tilde_x)
z = self.get_reconstructed_input(y)
    L = - T.sum(self.x_global * T.log(z) + (1 - self.x_global) * T.log(1 - z), axis=1)
#L=0

    if self.reconstruct_activation is T.tanh:
        L = T.sqr(self.x_global - z).sum(axis=1)
#    L=0

    if self.sparsity_weight is not None:
        sparsity_level = T.extra_ops.repeat(self.sparsity, self.n_hidden)
        avg_act = y.mean(axis=0)

        kl_div = self.kl_divergence(sparsity_level, avg_act)

        cost = T.mean(L) + self.sparsity_weight * kl_div.sum()
    else:
        cost = T.mean(L)

    # compute the gradients of the cost of the `dA` with respect
    # to its parameters (derivative cost with respect to params)
    gparams = T.grad(cost, self.params)
    # generate the list of updates
    updates = collections.OrderedDict()
    for dparam, gparam in zip(self.delta_params, gparams):
        updates[dparam] = momentum * dparam - gparam*learning_rate
    for dparam, param in zip(self.delta_params, self.params):
        updates[param] = param + updates[dparam]

    return (cost, updates)

###################################################

MaigoAkisame commented 7 years ago

I'm not sure if this is caused by overflow.

I see two possible paths for calculating L and cost. Do you know which path is actually executed in your experiment?