How do you know the model has converged?

yfeng95 / DECA

DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)

Other

2.15k stars 425 forks source link

How do you know the model has converged? #170

Open bxiong97 opened 1 year ago

bxiong97 commented 1 year ago

Hi, thank you for releasing the code! I have some questions about the training process:

How long do you train the model for? From the configs it seems that you train for 1~2 epochs in total, but how is this number decided?
How do you know it has converged? By converge I mean the validation loss is at its lowest point.
How do you know the model is not overfitting or underfitting?
What is the validation loss function? Is it the same as the training loss function?

Thank you for your time!

bxiong97 commented 1 year ago

In the validation step, all the code does is visualizing the results. How do we know quantitatively that the model is improving?

def validation_step(self):
    self.deca.eval()
    try:
        batch = next(self.val_iter)
    except:
        self.val_iter = iter(self.val_dataloader)
        batch = next(self.val_iter)
    images = batch['image'].cuda(); images = images.view(-1, images.shape[-3], images.shape[-2], images.shape[-1]) 
    with torch.no_grad():
        codedict = self.deca.encode(images)
        opdict, visdict = self.deca.decode(codedict)
    savepath = os.path.join(self.cfg.output_dir, self.cfg.train.val_vis_dir, f'{self.global_step:08}.jpg')
    util.visualize_grid(visdict, savepath)

emlcpfx commented 1 year ago

Were you able to train DECA?

511lyf commented 1 month ago

在验证步骤中，代码所做的只是可视化结果。我们如何定量地知道模型正在改进？

def validation_step(self):
    self.deca.eval()
    try:
        batch = next(self.val_iter)
    except:
        self.val_iter = iter(self.val_dataloader)
        batch = next(self.val_iter)
    images = batch['image'].cuda(); images = images.view(-1, images.shape[-3], images.shape[-2], images.shape[-1]) 
    with torch.no_grad():
        codedict = self.deca.encode(images)
        opdict, visdict = self.deca.decode(codedict)
    savepath = os.path.join(self.cfg.output_dir, self.cfg.train.val_vis_dir, f'{self.global_step:08}.jpg')
    util.visualize_grid(visdict, savepath)

Hello, could you please tell me how to determine whether the model has converged and has a good training result from the loss curve saved in TensorBoard? Thank you.