qijiezhao / pseudo-3d-pytorch

pytorch version of pseudo-3d-residual-networks(P-3D), pretrained model is supported
MIT License
450 stars 113 forks source link

The model didnot convergence #8

Closed zswzifir closed 6 years ago

zswzifir commented 6 years ago

Hi, I have the trouble to train the model, the model didnot convergence,. I really need your help, thx very much. my some code: The optimizer setting: policies = get_optim_policies(model) criterion = nn.CrossEntropyLoss().cuda() optimizer = torch.optim.SGD(policies, args.lr, momentum=args.momentum, weight_decay= rgs.weight_decay)

The transform setting: ` train_transform = video_transforms.Compose([ video_transforms.Scale((182)), video_transforms.MultiScaleCrop((160, 160), scale_ratios), video_transforms.RandomHorizontalFlip(), video_transforms.ToTensor(), normalize ])

test_transform = video_transforms.Compose([
        video_transforms.Scale((182)),
        video_transforms.CenterCrop((160)),
        video_transforms.ToTensor(),
        normalize
])`

the Train steps: ` def train(train_loader, model, criterion, optimizer, epoch): batch_time = AverageMeter() data_time = AverageMeter() losses = AverageMeter() top1 = AverageMeter() top3 = AverageMeter() model.train() end = time.time() for i, (inp, target) in enumerate(train_loader):

measure data loading time

    #show_loader_item(inp, target)

    data_time.update(time.time() - end)
    inp = inp.float().cuda(async=True)
    target = target.cuda(async=True)
    input_var = torch.autograd.Variable(inp)
    target_var = torch.autograd.Variable(target)

    output = model(input_var)
    loss = criterion(output, target_var)
    writer.add_scalar('data/loss', loss, i + epoch * (len(train_loader)))

    # measure accuracy and record loss
    prec1, prec3 = accuracy(output.data, target, topk=(1,3))
    losses.update(loss.data[0], inp.size(0))
    top1.update(prec1[0], inp.size(0))
    top3.update(prec3[0], inp.size(0))

    # compute gradient and do SGD step
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # measure elapsed time
    batch_time.update(time.time() - end)
    end = time.time()
    if i % args.print_freq == 0:
        #for name, param in model.named_parameters():
        #    writer.add_histogram(name, param, i)
        print('Epoch: [{0}][{1}/{2}]\t'
              'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
              'Data {data_time.val:.3f} ({data_time.avg:.3f})\t'
              'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
              'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t'
              'Prec@3 {top3.val:.3f} ({top3.avg:.3f})'.format(
               epoch, i, len(train_loader), batch_time=batch_time,
               data_time=data_time, loss=losses, top1=top1, top3=top3))

`

Can you help me to check if there are some mistakes?

qijiezhao commented 6 years ago

From the code that you support , I can not figure out the key part which makes a training error. Maybe you should check LR, policies and input images more carefully.

zswzifir commented 6 years ago

Thanks for reply, My lr=0.001, and multiply 0.1 every 10 epochs; the policies is same with your code. I will check my code again, thanks again.

icoz69 commented 6 years ago

hi did you find the reason. same thing happens to me as well.