Closed pokameng closed 1 year ago
@pokameng Hi, it is too slow. In our original implementation, sample each story takes around 30-40 secounds. I am wondering if the time cost for the first batch contains the prefetch time and hdf5 loading time, what about the time cost for the other batch? Also, in the sample mode, you could using ddp and also increase the batch size. For acceptable sample quality, you could try to set the guidance scale to 7.5, and the steps to 50, using a pndm scheduler, it can greatly reduce the sample time.
@Flash-321 Can I train and sample at the same time?
@pokameng Sure, you can simply modify the code in https://github.com/Flash-321/ARLDM/blob/eb907e3717ac20f82dfba8e67fd55d95127de098/main.py#L309-L311
def validation_step(self, batch, batch_idx):
original_images, images = self.sample(batch)
grid = torchvision.utils.make_grid(sample_imgs)
self.logger.experiment.add_image('generated_images', grid, 0)
also make sure to enable this module during training https://github.com/Flash-321/ARLDM/blob/eb907e3717ac20f82dfba8e67fd55d95127de098/main.py#L82-L97 but it will slow the training process, we recommend manually run a job to sample images.
What I mean is that I have now saved a ckpt and the training process is still executing, I have reopened another job to sample the saved ckpt, but I am worried that the ckpt will be overwritten by the new training process saved ckpt @Flash-321
What I mean is that I have now saved a ckpt and the training process is still executing, I have reopened another job to sample the saved ckpt, but I am worried that the ckpt will be overwritten by the new training process saved ckpt @Flash-321
@pokameng It doesn't matter, once your ckpt is loaded, the sample job do not rely on it anymore. You can also copy it to another folder to avoid this situation.
ok thanks!!!
what is the test_model_file ?
@rehammsalah here, https://github.com/xichenpan/ARLDM/blob/main/config.yaml#L25
How long will the sample progress end? It maybe so slowly,why? @Flash-321