princeton-vl / pose-ae-train

Training code for "Associative Embedding: End-to-End Learning for Joint Detection and Grouping"
BSD 3-Clause "New" or "Revised" License
373 stars 76 forks source link

How many epoches do you run? #19

Closed ahangchen closed 6 years ago

ahangchen commented 6 years ago

As epoch_num is not specified in the task/pose.py config, we need to specify the epoch num or interrupt the training manually.

I run the model for 60 epoches with this code but got poor results that AP=0.397 in multi scale mode.

How many epoches do you run? Can anyone reach the AP declared in the paper?

Yishun99 commented 6 years ago

How many batchsize and train_iters did you set?

ahangchen commented 6 years ago

first I try batchsize16 and train_iters 1000, lr 2e-4, get AP=0.397(multi scale) then I adjust to batchsize16 and train_iters 2000, lr 1e-4, get AP=0.558(multi scale), still far away from the result in paper.

Yishun99 commented 6 years ago

I load pretrained model checkpoint = torch.load(resume_file) print(checkpoint['epoch'])

then I get 362.

ahangchen commented 6 years ago

Amazing! I think I need to try for more epoches.

taojake commented 6 years ago

@DouYishun i trained on 4 batchsize, 450 epochs, and got 0.462(single) and 0.561(multi), and i figure out that when epochs grow up, the res will down.maybe batchsize is a big deal, but how can i improve it?

Yishun99 commented 6 years ago

@taojake I'm also struggling with only one GPU.

ahangchen commented 6 years ago

I believe that when you decrease the batch size, you need to decrease the learning rate and increase iteration numbers.

liuyu666-thu commented 5 years ago

@ahangchen @taojake Got confused with train_iters here. What does 1000 train_iters mean? I understand : batchsize * iter_per_epoch = all_samples_number, but 1000 is far from enough to iterate through the whole dataset. Please help me.

ahangchen commented 5 years ago

epoch = train_iters / iter_per_epoch

pavanteja295 commented 5 years ago

You can roughly calculate how many times each image is seen considering a uniform distribution from the dataset and calculate accordingly the total number of iters you need with ur own batch size

coordxyz commented 5 years ago

@DouYishun i trained on 4 batchsize, 450 epochs, and got 0.462(single) and 0.561(multi), and i figure out that when epochs grow up, the res will down.maybe batchsize is a big deal, but how can i improve it? @taojake Did you train it from scratch with input_res 512 and output_res 128? Did you finally reach the AP declared in the paper?