naba89 / RNN-Handwriting-Generation-Pytorch

sequence generation using RNN using pytorch
Apache License 2.0
10 stars 1 forks source link

Can a loss be negative? #1

Open educob opened 5 years ago

educob commented 5 years ago

Hi. Yesterday I adapted the net to run with pytorch 1.1. After 40 epochs its writting was not good at all and errors are not improving.

But what most confused me is that loss is many times negative. Can a loss be negative?

Thanks.

naba89 commented 5 years ago

Hi

Well, the code used to work in 0.4. I have not tested on 1.1 yet. Maybe some api's might have undergone some changes.
The loss can be negative. What does your training graph look like? Does it in general follow the one I shared in the README?

And what exactly do you mean by writing is not good. Is it like very small or not in a line or some strokes go very long etc?

However, there is one thing I would like you to modify and check

In line 63 of model.py

Change this line: if self.hidden is not None: to: if self.hidden is not None and self.training:

Check if that works well while generating sequences.

educob commented 5 years ago

Hi. Loss starts between 2.5-3.5 and goes down quite fast to 0.3-0.5 (positive and negative). I don't know how to start he tensorboard thing but in principle it looks similar to yours. But the writing is just gibberish. It doesn't look writing at all.

I made the change with self.training but the result looks the same as before.

Thanks for the code.

This is my modifed train.py:

import argparse
import os
import pickle
import time

import numpy as np

import torch.optim as optim
import torch
from torch.autograd import Variable

from loss_functions import PredictionLoss
from model import RNNPredictNet
from utils import DataLoader
from sample import sample_stroke

from tensorboardX import SummaryWriter

writer = SummaryWriter()

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--input_size', type=int, default=3,
                        help='input num features')
    parser.add_argument('--hidden_size', type=int, default=256,
                        help='size of RNN hidden state')
    parser.add_argument('--num_layers', type=int, default=2,
                        help='number of layers in the RNN')
    parser.add_argument('--bidirectional', type=bool, default=False,
                        help='use BLSTM')
    parser.add_argument('--batch_size', type=int, default=50,
                        help='batch size')
    parser.add_argument('--seq_length', type=int, default=300,
                        help='RNN sequence length')
    parser.add_argument('--num_epochs', type=int, default=300,
                        help='number of epochs')
    parser.add_argument('--save_every', type=int, default=100,
                        help='save frequency')
    parser.add_argument('--model_dir', type=str, default='save',
                        help='directory to save model to')
    parser.add_argument('--grad_clip', type=float, default=10.,
                        help='clip gradients at this value')
    parser.add_argument('--learning_rate', type=float, default=0.005,
                        help='learning rate')
    parser.add_argument('--decay_rate', type=float, default=0.95,
                        help='decay rate for rmsprop')
    parser.add_argument('--num_mixture', type=int, default=20,
                        help='number of gaussian mixtures')
    parser.add_argument('--data_scale', type=float, default=20,
                        help='factor to scale raw data down by')
    parser.add_argument('--keep_prob', type=float, default=0.8,
                        help='dropout keep probability')
    parser.add_argument('--validate_every', type=int, default=10,
                        help='frequency of validation')
    args = parser.parse_args()
    train(args)

def train(args):
    data_loader = DataLoader(args.batch_size, args.seq_length, args.data_scale)

    if args.model_dir != '' and not os.path.exists(args.model_dir):
        os.makedirs(args.model_dir)

    with open(os.path.join(args.model_dir, 'config.pkl'), 'wb') as f:
        pickle.dump(args, f)

    model = RNNPredictNet(args).to(device)

    loss_fn = PredictionLoss(args.batch_size, args.seq_length)
    optimizer = optim.Adam(model.parameters(), lr=args.learning_rate)
    lr_scheduler = optim.lr_scheduler.ExponentialLR(optimizer=optimizer, gamma=args.decay_rate)

    #training_loss = []
    #validation_loss = []

    for e in range(args.num_epochs):
        data_loader.reset_batch_pointer()
        v_x, v_y = data_loader.validation_data()
        v_x = torch.FloatTensor(v_x).to(device)
        v_y = torch.FloatTensor(v_y).to(device)

        for b in range(data_loader.num_batches):
            model.train()
            train_step = e * data_loader.num_batches + b
            start = time.time()

            x, y = data_loader.next_batch()
            x = torch.FloatTensor(x).to(device)
            y = torch.FloatTensor(y).to(device)

            optimizer.zero_grad()
            output = model(x)

            train_loss = loss_fn(output, y)

            train_loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), args.grad_clip)

            optimizer.step()

            #training_loss.append(train_loss.data[0])
            writer.add_scalar('Training Loss', train_loss.item(), train_step)

            model.eval()
            with torch.no_grad():
                output = model(v_x)
                val_loss = loss_fn(output, v_y)
            #validation_loss.append(val_loss.data[0])

            end = time.time()

            print(
                "{}/{} (epoch {}), train_loss = {:.3f}, valid_loss = {:.3f}, time/batch = {:.3f}"
                    .format(
                    train_step,
                    args.num_epochs * data_loader.num_batches,
                    e,
                    train_loss.item(),
                    val_loss.item(),
                    end - start))

            if (train_step % args.save_every == 0) and (train_step > 0):

                checkpoint_path = os.path.join(args.model_dir, 'model.pth')
                torch.save({
                    'model': model.state_dict(),
                    'optimizer': optimizer.state_dict(),
                    'epoch': e,
                    'current_lr': args.learning_rate * (args.decay_rate ** e)
                },  checkpoint_path)

                _, img = sample_stroke() # error svg
                #print("model saved to {}".format(checkpoint_path))
        lr_scheduler.step()

if __name__ == '__main__':
    main()
    writer.close()
naba89 commented 5 years ago

Aah, so by gibberish you mean there are no words? That is expected because the current version is unconditional. It just randomly generates strokes which look like handwriting. I did not get around to implement the conditional version, where you can essentially give a word as an input and the model will write that word in the handwriting.

educob commented 5 years ago

Hi. Sorry I didn't explain myself well.

I mean that what is generates can't be considered hand writing but random lines.

I know it's unconditional (actually what I wanted).

Cheers.

On Sat, May 4, 2019 at 3:58 AM Nabarun Goswami notifications@github.com wrote:

Aah, so by gibberish you mean there are no words? That is expected because the current version is unconditional. It just randomly generates strokes which look like handwriting. I did not get around to implement the conditional version, where you can essentially give a word as an input and the model will write that word in the handwriting.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/naba89/RNN-Handwriting-Generation-Pytorch/issues/1#issuecomment-489284728, or mute the thread https://github.com/notifications/unsubscribe-auth/ADA4E43WF77BIICODC4TU63PTTUTDANCNFSM4HJ4LXNQ .

Weifeilong611 commented 4 years ago

sorry,I don't konw why the loss will be negative? follow your computational formula,it should not be a negative value. Can you tell me some detail for this situation? Thanks!