xuekt98 / BBDM

BBDM: Image-to-image Translation with Brownian Bridge Diffusion Models
MIT License
281 stars 30 forks source link

Error for model_load_path. #10

Open yunfei920406 opened 1 year ago

yunfei920406 commented 1 year ago

Some errors appeared as follows:

Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off] /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) Loading model from: /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth create output path results/NonCA/LBBDM-f4/ Working with z of shape (1, 3, 64, 64) = 12288 dimensions. Restored from model/VQGAN/model.ckpt load vqgan from model/VQGAN/model.ckpt get parameters to optimize: UNet Total Number of parameter: 292.41M Trainable Number of parameter: 237.09M load model LBBDM-f4 from path/to/model_ckpt Traceback (most recent call last): File "/home/yunfei/Projects/BBDM-main/main.py", line 130, in main() File "/home/yunfei/Projects/BBDM-main/main.py", line 125, in main CPU_singleGPU_launcher(nconfig) File "/home/yunfei/Projects/BBDM-main/main.py", line 91, in CPU_singleGPU_launcher runner = get_runner(config.runner, config) File "/home/yunfei/Projects/BBDM-main/utils.py", line 45, in get_runner runner = Registers.runnersrunner_name File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 19, in init super().init(config) File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/DiffusionBaseRunner.py", line 11, in init super().init(config) File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 68, in init self.load_model_from_checkpoint() File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 37, in load_model_from_checkpoint states = super().load_model_from_checkpoint() File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 105, in load_model_from_checkpoint model_states = torch.load(self.config.model.model_load_path, map_location='cpu') File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 771, in load with _open_file_like(f, 'rb') as opened_file: File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 270, in _open_file_like return _open_file(name_or_buffer, mode) File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 251, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'path/to/model_ckpt'

yunfei920406 commented 1 year ago

I noticed that in the configs/*****.yaml file, the parameter of "model_load_path" was annotated, which resulted the aforementioned error. How should I deal with it?

yunfei920406 commented 1 year ago

b1304d7b6fd5e0dc350eebd225ca7ebe

yunfei920406 commented 1 year ago

how should I define the model_load_path as I am training with my own datasets from beginning.

xuekt98 commented 1 year ago

you can leave this annotated

yunfei920406 commented 1 year ago

you can leave this annotated

Thanks for your reply.

yunfei920406 commented 1 year ago

Another issue I encounter is that: in some training epoch, the following errors will appear but they don't appear in every epoch. Beisdes, the tranning process will continue work despite above errors. Can you tell me why?

yunfei920406 commented 1 year ago

you can leave this annotated

Another issue I encounter is that: in some training epoch, the following errors will appear but they don't appear in every epoch. Beisdes, the tranning process will continue work despite above errors. Can you tell me why?

02634aaede6509b0f8e6165adc04823b

9ww1 commented 1 year ago

I also met the same problem. i want to train my datasets.but how to set the vaule of the two parameters,resume_model and resume_optim .Could you give me some advice to solve it? thank you !

Some errors appeared as follows:

Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.warnings.warn(/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum orNonefor 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passingweights=AlexNet_Weights.IMAGENET1K_V1. You can also useweights=AlexNet_Weights.DEFAULTto get the most up-to-date weights.warnings.warn(msg)Loading model from: /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pthcreate output path results/NonCA/LBBDM-f4/Working with z of shape (1, 3, 64, 64) = 12288 dimensions.Restored from model/VQGAN/model.ckptload vqgan from model/VQGAN/model.ckptget parameters to optimize: UNetTotal Number of parameter: 292.41MTrainable Number of parameter: 237.09Mload model LBBDM-f4 from path/to/model_ckptTraceback (most recent call last):File "/home/yunfei/Projects/BBDM-main/main.py", line 130, inmain()File "/home/yunfei/Projects/BBDM-main/main.py", line 125, in mainCPU_singleGPU_launcher(nconfig)File "/home/yunfei/Projects/BBDM-main/main.py", line 91, in CPU_singleGPU_launcherrunner = get_runner(config.runner, config)File "/home/yunfei/Projects/BBDM-main/utils.py", line 45, in get_runnerrunner = Registers.runnersrunner_nameFile "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 19, ininitsuper().init(config)File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/DiffusionBaseRunner.py", line 11, ininitsuper().init(config)File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 68, ininitself.load_model_from_checkpoint()File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 37, in load_model_from_checkpointstates = super().load_model_from_checkpoint()File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 105, in load_model_from_checkpointmodel_states = torch.load(self.config.model.model_load_path, map_location='cpu')File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 771, in loadwith _open_file_like(f, 'rb') as opened_file:File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 270, in _open_file_likereturn _open_file(name_or_buffer, mode)File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 251, ininitsuper(_open_file, self).init(open(name, mode))FileNotFoundError: [Errno 2] No such file or directory: 'path/to/model_ckpt'

xuekt98 commented 1 year ago

THank you for your attention. If you need to train a new model, then you only need to leave the model_ckpt and optim_ckpt as default.

---- Replied Message ---- | From | @.> | | Date | 08/02/2023 11:28 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [xuekt98/BBDM] Error for model_load_path. (Issue #10) |

I also met the same problem. i want to train my datasets.but how to set the vaule of the two parameters,resume_model and resume_optim .Could you give me some advice to solve it? thank you !

Some errors appeared as follows:

Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.warnings.warn(/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum orNonefor 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passingweights=AlexNet_Weights.IMAGENET1K_V1. You can also useweights=AlexNet_Weights.DEFAULTto get the most up-to-date weights.warnings.warn(msg)Loading model from: /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pthcreate output path results/NonCA/LBBDM-f4/Working with z of shape (1, 3, 64, 64) = 12288 dimensions.Restored from model/VQGAN/model.ckptload vqgan from model/VQGAN/model.ckptget parameters to optimize: UNetTotal Number of parameter: 292.41MTrainable Number of parameter: 237.09Mload model LBBDM-f4 from path/to/model_ckptTraceback (most recent call last):File "/home/yunfei/Projects/BBDM-main/main.py", line 130, inmain()File "/home/yunfei/Projects/BBDM-main/main.py", line 125, in mainCPU_singleGPU_launcher(nconfig)File "/home/yunfei/Projects/BBDM-main/main.py", line 91, in CPU_singleGPU_launcherrunner = get_runner(config.runner, config)File "/home/yunfei/Projects/BBDM-main/utils.py", line 45, in get_runnerrunner = Registers.runnersrunner_nameFile "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 19, ininitsuper().init(config)File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/DiffusionBaseRunner.py", line 11, ininitsuper().init(config)File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 68, ininitself.load_model_from_checkpoint()File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 37, in load_model_from_checkpointstates = super().load_model_from_checkpoint()File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 105, in load_model_from_checkpointmodel_states = torch.load(self.config.model.model_load_path, map_location='cpu')File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 771, in loadwith _open_file_like(f, 'rb') as opened_file:File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 270, in _open_file_likereturn _open_file(name_or_buffer, mode)File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 251, ininitsuper(_open_file, self).init(open(name, mode))FileNotFoundError: [Errno 2] No such file or directory: 'path/to/model_ckpt'

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

xuekt98 commented 1 year ago

I suggest you to use LBBDM(latent space) not BBDM(pixel space) or you can decrease batch size and model channels

---- Replied Message ---- | From | @.> | | Date | 08/02/2023 11:57 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [xuekt98/BBDM] Error for model_load_path. (Issue #10) |

Thank you very much for replying to my question amidst your busy schedule!I tried the method you mentioned. During the training process, I used Python 3 main. py -- config/ Configs/Template BBDM. yaml -- train -- sample At Start -- save Top -- gpu IDS 0 was trained, but there has been a constant issue of cuda overflow. I tried to train with larger graphics memory, but the same problem still exists. Can you give me some suggestions to solve the current dilemma!(The specific error message is shown in the picture)

 

9/5000

太空人? @.***

 

------------------ 原始邮件 ------------------ 发件人: "xuekt98/BBDM" @.>; 发送时间: 2023年8月2日(星期三) 中午11:35 @.>; @.**@.>; 主题: Re: [xuekt98/BBDM] Error for model_load_path. (Issue #10)

THank you for your attention. If you need to train a new model, then you only need to leave the model_ckpt and optim_ckpt as default.

---- Replied Message ---- | From | @.> | | Date | 08/02/2023 11:28 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [xuekt98/BBDM] Error for model_load_path. (Issue #10) |

I also met the same problem. i want to train my datasets.but how to set the vaule of the two parameters,resume_model and resume_optim .Could you give me some advice to solve it? thank you !

Some errors appeared as follows:

Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.warnings.warn(/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum orNonefor 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passingweights=AlexNet_Weights.IMAGENET1K_V1. You can also useweights=AlexNet_Weights.DEFAULTto get the most up-to-date weights.warnings.warn(msg)Loading model from: /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pthcreate output path results/NonCA/LBBDM-f4/Working with z of shape (1, 3, 64, 64) = 12288 dimensions.Restored from model/VQGAN/model.ckptload vqgan from model/VQGAN/model.ckptget parameters to optimize: UNetTotal Number of parameter: 292.41MTrainable Number of parameter: 237.09Mload model LBBDM-f4 from path/to/model_ckptTraceback (most recent call last):File "/home/yunfei/Projects/BBDM-main/main.py", line 130, inmain()File "/home/yunfei/Projects/BBDM-main/main.py", line 125, in mainCPU_singleGPU_launcher(nconfig)File "/home/yunfei/Projects/BBDM-main/main.py", line 91, in CPU_singleGPU_launcherrunner = get_runner(config.runner, config)File "/home/yunfei/Projects/BBDM-main/utils.py", line 45, in get_runnerrunner = Registers.runnersrunner_nameFile "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 19, ininitsuper().init(config)File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/DiffusionBaseRunner.py", line 11, ininitsuper().init(config)File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 68, ininitself.load_model_from_checkpoint()File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 37, in load_model_from_checkpointstates = super().load_model_from_checkpoint()File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 105, in load_model_from_checkpointmodel_states = torch.load(self.config.model.model_load_path, map_location='cpu')File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 771, in loadwith _open_file_like(f, 'rb') as opened_file:File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 270, in _open_file_likereturn _open_file(name_or_buffer, mode)File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 251, ininitsuper(_open_file, self).init(open(name, mode))FileNotFoundError: [Errno 2] No such file or directory: 'path/to/model_ckpt'

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Wenting-Xu commented 11 months ago

出现了一些错误如下:

设置[LPIPS]感知损失:trunk [alex],v[0.1],spatial [off] /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py :208: UserWarning: 参数“pretrained”自 0.13 起已弃用,并且将来可能会被删除,请改用“weights”。 warnings.warn( /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning:不推荐 使用权重枚举或“权重”以外的参数None从0.13开始,将来可能会被删除。当前的行为相当于通过weights=AlexNet_Weights.IMAGENET1K_V1。您也可以使用weights=AlexNet_Weights.DEFAULT来获取最新的权重。 warnings.warn(msg) 从以下位置加载模型: /home/yunfei/anaconda3/ envs/BBDM/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth 创建输出路径 results/NonCA/LBBDM-f4/ 使用形状的 z (1, 3, 64, 64) = 12288 维度。 从 model/VQGAN/model.ckpt 恢复 从 model/VQGAN/model.ckpt 加载 vqgan 获取要优化的参数:UNet 参数总数:292.41M 可训练 参数数量:237.09M 从路径加载模型 LBBDM-f4 /to/model_ckpt 回溯(最近一次调用最后): 文件“/home/yunfei/Projects/BBDM-main/main.py”,第 130 行,在 main() 文件“/home/yunfei/Projects/BBDM-main/ main.py”,第 125 行,在主 CPU_singleGPU_launcher(nconfig) 文件“/home/yunfei/Projects/BBDM-main/main.py”,第 91 行,在 CPU_singleGPU_launcher runner = get_runner(config.runner, config) 文件“/ home/yunfei/Projects/BBDM-main/utils.py”,第 45 行,在 get_runner runner = Registers.runners runner_name 文件“/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py”,第 19 行,在init super() 中。init (config) 文件“/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/DiffusionBaseRunner.py”,第 11 行,init super() 中。init (config) 文件“/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py”,第 68 行,init self.load_model_from_checkpoint() 文件“/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners /BBDMRunner.py”,第 37 行,在 load_model_from_checkpoint states = super().load_model_from_checkpoint() 文件“/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py”,第 105 行,在 load_model_from_checkpoint model_states = torch.load (self.config.model.model_load_path,map_location='cpu') 文件“/home/yunfei/anaconda3/envs/BBDM/lib/python3. 以 _open_file_like(f, 'rb') 作为打开的文件: 文件“/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py”,第 270 行,在 _open_file_like return _open_file( name_or_buffer, mode) 文件“/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py”,第 251 行,在init super(_open_file, self) 中。init (open(name, mode)) FileNotFoundError: [Errno 2] 没有这样的文件或目录: 'path/to/model_ckpt'

I have the same question. These warnings seriously affect the speed of training. Have you solved this problem?