sihyun-yu / PVDM

Official PyTorch implementation of Video Probabilistic Diffusion Models in Projected Latent Space (CVPR 2023).
https://sihyun.me/PVDM
MIT License
287 stars 15 forks source link

Missing ddconfig parameters in configs/latent-diffusion/base.yaml #21

Closed hychen-naza closed 11 months ago

hychen-naza commented 1 year ago

Hi I get errors when running

python main.py \
 --exp ddpm \
 --id main \
 --pretrain_config configs/latent-diffusion/base.yaml \
 --data UCF101 \
 --first_model 'results/first_stage_main_gan_UCF101_42/model_last.pth'  
 --diffusion_config configs/latent-diffusion/base.yaml \
 --batch_size 48

It says we don't have the key of model.params.ddconfig and I find out that is not included in base.yaml. Could you help to fix this issue?

sihyun-yu commented 11 months ago

Hi,

Can you share the exact error message?

SPengLiang commented 11 months ago

same problem.

File "/private2/pengliang/VideoSmoother/PVDM/main.py", line 59, in main args.res = first_stage_config.model.params.ddconfig.resolution File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 355, in getattr self._format_and_raise( File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/omegaconf/base.py", line 231, in _format_and_raise format_and_raise( File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/omegaconf/_utils.py", line 899, in format_and_raise _raise(ex, cause) File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/omegaconf/_utils.py", line 797, in _raise raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 351, in getattr return self._get_impl( File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl node = self._get_child( File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/omegaconf/basecontainer.py", line 73, in _get_child child = self._get_node( File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 480, in _get_node raise ConfigKeyError(f"Missing key {key!s}") omegaconf.errors.ConfigAttributeError: Missing key ddconfig full_key: model.params.ddconfig object_type=dict

sihyun-yu commented 11 months ago

Hi @SPengLiang,

can you try the following script instead?

 python main.py \
 --exp ddpm \
 --id main \
 --pretrain_config configs/autoencoder/base.yaml \
 --data UCF101 \
 --first_model 'results/first_stage_main_gan_UCF101_42/model_last.pth'  
 --diffusion_config configs/latent-diffusion/base.yaml \
 --batch_size 48
SPengLiang commented 11 months ago

Hi @SPengLiang,

can you try the following script instead?

 python main.py \
 --exp ddpm \
 --id main \
 --pretrain_config configs/autoencoder/base.yaml \
 --data UCF101 \
 --first_model 'results/first_stage_main_gan_UCF101_42/model_last.pth'  
 --diffusion_config configs/latent-diffusion/base.yaml \
 --batch_size 48

Thanks for your reply! This new script works, but then I got the following error:

[2023-07-26 19:21:11.400526] [Time 293.230] [FVD 11383.283203] Traceback (most recent call last): File "/private2/pengliang/VideoSmoother/PVDM/main.py", line 93, in main() File "/private2/pengliang/VideoSmoother/PVDM/main.py", line 71, in main torch.multiprocessing.spawn(fn=diffusion, args=(args, ), nprocs=args.n_gpus) File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes while not context.join(): File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 3 terminated with the following error: Traceback (most recent call last): File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/private2/pengliang/VideoSmoother/PVDM/exps/diffusion.py", line 142, in diffusion latentDDPM(rank, first_stage_model, model, opt, criterion, train_loader, test_loader, lr_scheduler, ema_model, condprob, logger) File "/private2/pengliang/VideoSmoother/PVDM/tools/trainer.py", line 51, in latentDDPM for it, (x, ) in enumerate(train_loader): File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 628, in next data = self._next_data() File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data return self._process_data(data) File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data data.reraise() File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/torch/_utils.py", line 543, in reraise raise exception ValueError: Caught ValueError in DataLoader worker process 1. Original Traceback (most recent call last): File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/pengliang/.conda/envs/video/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/private2/pengliang/VideoSmoother/PVDM/tools/dataloader.py", line 142, in getitem prefix = np.random.randint(len(video)-self.nframes+1) File "mtrand.pyx", line 765, in numpy.random.mtrand.RandomState.randint File "_bounded_integers.pyx", line 1247, in numpy.random._bounded_integers._rand_int64 ValueError: high <= 0

sihyun-yu commented 11 months ago

It normally happens if the dataset contains very short videos shorter than self.nframes. You should be able to run the script after you remove all of these videos.

SPengLiang commented 11 months ago

Thanks for your quick reply! I am using the UCF101 dataset, can you give some advice?

sihyun-yu commented 11 months ago

If I remember correctly, there are a few videos in the UCF-101 dataset that has the length is shorter than 32. I excluded these videos in training and the impact was neglectable. Otherwise you can consider adding zero-padding before these videos, to use these short videos for unconditional modeling.

SPengLiang commented 11 months ago

Thanks!