Closed ahangchen closed 6 years ago
How many batchsize and train_iters did you set?
first I try batchsize16 and train_iters 1000, lr 2e-4, get AP=0.397(multi scale) then I adjust to batchsize16 and train_iters 2000, lr 1e-4, get AP=0.558(multi scale), still far away from the result in paper.
I load pretrained model
checkpoint = torch.load(resume_file)
print(checkpoint['epoch'])
then I get 362.
Amazing! I think I need to try for more epoches.
@DouYishun i trained on 4 batchsize, 450 epochs, and got 0.462(single) and 0.561(multi), and i figure out that when epochs grow up, the res will down.maybe batchsize is a big deal, but how can i improve it?
@taojake I'm also struggling with only one GPU.
I believe that when you decrease the batch size, you need to decrease the learning rate and increase iteration numbers.
@ahangchen @taojake
Got confused with train_iters
here. What does 1000 train_iters
mean?
I understand : batchsize * iter_per_epoch = all_samples_number
, but 1000 is far from enough to iterate through the whole dataset.
Please help me.
epoch = train_iters / iter_per_epoch
You can roughly calculate how many times each image is seen considering a uniform distribution from the dataset and calculate accordingly the total number of iters you need with ur own batch size
@DouYishun i trained on 4 batchsize, 450 epochs, and got 0.462(single) and 0.561(multi), and i figure out that when epochs grow up, the res will down.maybe batchsize is a big deal, but how can i improve it? @taojake Did you train it from scratch with input_res 512 and output_res 128? Did you finally reach the AP declared in the paper?
As epoch_num is not specified in the task/pose.py config, we need to specify the epoch num or interrupt the training manually.
I run the model for 60 epoches with this code but got poor results that AP=0.397 in multi scale mode.
How many epoches do you run? Can anyone reach the AP declared in the paper?