williamyang1991 / Rerender_A_Video

[SIGGRAPH Asia 2023] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
https://www.mmlab-ntu.com/project/rerender/
Other
2.96k stars 202 forks source link

[bug] FileNotFoundError: No such file or directory (os error 2) #138

Closed jueming0312 closed 3 months ago

jueming0312 commented 3 months ago

OS: Ubuntu 22.04 driver version: 550.90.07 cuda version: 11.8

root@792f6b219936:/workspace/Rerender_A_Video# python3 rerender.py --cfg config/real2sculpture.json
logging improved.
ControlLDM: Running in eps-prediction mode
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 939k/939k [00:00<00:00, 1.34MB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512k/512k [00:00<00:00, 880kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 389/389 [00:00<00:00, 917kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 905/905 [00:00<00:00, 2.00MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.41k/4.41k [00:00<00:00, 7.49MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.59G/1.59G [01:23<00:00, 20.4MB/s]
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Loaded model config from [./deps/ControlNet/models/cldm_v15.yaml]
Loaded state_dict from [./models/control_sd15_canny.pth]
Traceback (most recent call last):
  File "/workspace/Rerender_A_Video/rerender.py", line 466, in <module>
    rerender(cfg, args.one, args.key_video_path)
  File "/workspace/Rerender_A_Video/rerender.py", line 89, in rerender
    model.load_state_dict(load_file(cfg.sd_model), strict=False)
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 98, in load_file
    with safe_open(filename, framework="pt", device=device) as f:
FileNotFoundError: No such file or directory (os error 2)
williamyang1991 commented 3 months ago

You didn't download the model

https://github.com/williamyang1991/Rerender_A_Video/blob/dfaf9d8825f226a2f0a0b731ab2adc84a3f2ebd2/rerender.py#L86-L89

jueming0312 commented 3 months ago

I have now downloaded two models and modified sd_model_cfg.py. Should I modify any other configurations?

root@9ebfd57c673a:/workspace/Rerender_A_Video# ll models/
total 19121436
drwxr-xr-x 1 root root         90 Aug  4 08:00 ./
drwxr-xr-x 1 root root        125 Aug  4 08:12 ../
-rw-r--r-- 1 root root 5710753329 Aug  4 07:36 control_sd15_canny.pth
-rw-r--r-- 1 root root 5710750165 Aug  4 07:41 control_sd15_hed.pth
-rw-r--r-- 1 root root   18768907 Aug  4 07:30 gmflow_sintel-0c07dcb3.pth
-rw-r--r-- 1 root root 2132625894 Dec  3  2023 realisticVisionV60B1_v60B1VAE.safetensors
-rw-r--r-- 1 root root 5672745097 Nov 14  2023 revAnimated_v11.safetensors
-rw-r--r-- 1 root root  334695179 Aug  4 07:41 vae-ft-mse-840000-ema-pruned.ckpt
root@9ebfd57c673a:/workspace/Rerender_A_Video# cat sd_model_cfg.py
# The model dict is used for webUI only

model_dict = {
    'Stable Diffusion 1.5': '',
    'revAnimated_v11': 'models/revAnimated_v11.safetensors',
    'realisticVisionV20_v20': 'models/realisticVisionV60B1_v60B1VAE.safetensors'
}
jueming0312 commented 3 months ago

When I run the Example configs Sample 3 video, the following error message appears.

[ WARN:0@2.322] global loadsave.cpp:241 findDecoder imread_('result/pexels-koolshooters-7322716/video/0271.png'): can't open/read file: check file path/integrity
[ WARN:0@2.322] global loadsave.cpp:241 findDecoder imread_('result/pexels-koolshooters-7322716/video/0272.png'): can't open/read file: check file path/integrity
Process Process-4:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/workspace/Rerender_A_Video/video_blend.py", line 109, in process_sequences
    process_one_sequence(i, video_sequence)
  File "/workspace/Rerender_A_Video/video_blend.py", line 76, in process_one_sequence
    flow_calc.get_flow(i1, i2, flow_seq[j])
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/Rerender_A_Video/flow/flow_utils.py", line 170, in get_flow
    image1 = torch.from_numpy(image1).permute(2, 0, 1).float()
TypeError: expected np.ndarray (got NoneType)
/usr/local/lib/python3.10/dist-packages/torch/functional.py:507: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3549.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.10/dist-packages/torch/functional.py:507: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3549.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.10/dist-packages/torch/functional.py:507: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3549.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
[ WARN:0@101.333] global loadsave.cpp:241 findDecoder imread_('result/pexels-koolshooters-7322716/video/0231.png'): can't open/read file: check file path/integrity
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/workspace/Rerender_A_Video/video_blend.py", line 109, in process_sequences
    process_one_sequence(i, video_sequence)
  File "/workspace/Rerender_A_Video/video_blend.py", line 76, in process_one_sequence
    flow_calc.get_flow(i1, i2, flow_seq[j])
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/Rerender_A_Video/flow/flow_utils.py", line 170, in get_flow
    image1 = torch.from_numpy(image1).permute(2, 0, 1).float()
TypeError: expected np.ndarray (got NoneType)