Open yunfei920406 opened 1 year ago
I noticed that in the configs/*****.yaml file, the parameter of "model_load_path" was annotated, which resulted the aforementioned error. How should I deal with it?
how should I define the model_load_path as I am training with my own datasets from beginning.
you can leave this annotated
you can leave this annotated
Thanks for your reply.
Another issue I encounter is that: in some training epoch, the following errors will appear but they don't appear in every epoch. Beisdes, the tranning process will continue work despite above errors. Can you tell me why?
you can leave this annotated
Another issue I encounter is that: in some training epoch, the following errors will appear but they don't appear in every epoch. Beisdes, the tranning process will continue work despite above errors. Can you tell me why?
I also met the same problem. i want to train my datasets.but how to set the vaule of the two parameters,resume_model and resume_optim .Could you give me some advice to solve it? thank you !
Some errors appeared as follows:
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.warnings.warn(/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or
None
for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passingweights=AlexNet_Weights.IMAGENET1K_V1
. You can also useweights=AlexNet_Weights.DEFAULT
to get the most up-to-date weights.warnings.warn(msg)Loading model from: /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pthcreate output path results/NonCA/LBBDM-f4/Working with z of shape (1, 3, 64, 64) = 12288 dimensions.Restored from model/VQGAN/model.ckptload vqgan from model/VQGAN/model.ckptget parameters to optimize: UNetTotal Number of parameter: 292.41MTrainable Number of parameter: 237.09Mload model LBBDM-f4 from path/to/model_ckptTraceback (most recent call last):File "/home/yunfei/Projects/BBDM-main/main.py", line 130, inmain()File "/home/yunfei/Projects/BBDM-main/main.py", line 125, in mainCPU_singleGPU_launcher(nconfig)File "/home/yunfei/Projects/BBDM-main/main.py", line 91, in CPU_singleGPU_launcherrunner = get_runner(config.runner, config)File "/home/yunfei/Projects/BBDM-main/utils.py", line 45, in get_runnerrunner = Registers.runnersrunner_nameFile "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 19, ininitsuper().init(config)File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/DiffusionBaseRunner.py", line 11, ininitsuper().init(config)File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 68, ininitself.load_model_from_checkpoint()File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 37, in load_model_from_checkpointstates = super().load_model_from_checkpoint()File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 105, in load_model_from_checkpointmodel_states = torch.load(self.config.model.model_load_path, map_location='cpu')File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 771, in loadwith _open_file_like(f, 'rb') as opened_file:File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 270, in _open_file_likereturn _open_file(name_or_buffer, mode)File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 251, ininitsuper(_open_file, self).init(open(name, mode))FileNotFoundError: [Errno 2] No such file or directory: 'path/to/model_ckpt'
THank you for your attention. If you need to train a new model, then you only need to leave the model_ckpt and optim_ckpt as default.
---- Replied Message ---- | From | @.> | | Date | 08/02/2023 11:28 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [xuekt98/BBDM] Error for model_load_path. (Issue #10) |
I also met the same problem. i want to train my datasets.but how to set the vaule of the two parameters,resume_model and resume_optim .Could you give me some advice to solve it? thank you !
Some errors appeared as follows:
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.warnings.warn(/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum orNonefor 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passingweights=AlexNet_Weights.IMAGENET1K_V1. You can also useweights=AlexNet_Weights.DEFAULTto get the most up-to-date weights.warnings.warn(msg)Loading model from: /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pthcreate output path results/NonCA/LBBDM-f4/Working with z of shape (1, 3, 64, 64) = 12288 dimensions.Restored from model/VQGAN/model.ckptload vqgan from model/VQGAN/model.ckptget parameters to optimize: UNetTotal Number of parameter: 292.41MTrainable Number of parameter: 237.09Mload model LBBDM-f4 from path/to/model_ckptTraceback (most recent call last):File "/home/yunfei/Projects/BBDM-main/main.py", line 130, inmain()File "/home/yunfei/Projects/BBDM-main/main.py", line 125, in mainCPU_singleGPU_launcher(nconfig)File "/home/yunfei/Projects/BBDM-main/main.py", line 91, in CPU_singleGPU_launcherrunner = get_runner(config.runner, config)File "/home/yunfei/Projects/BBDM-main/utils.py", line 45, in get_runnerrunner = Registers.runnersrunner_nameFile "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 19, ininitsuper().init(config)File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/DiffusionBaseRunner.py", line 11, ininitsuper().init(config)File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 68, ininitself.load_model_from_checkpoint()File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 37, in load_model_from_checkpointstates = super().load_model_from_checkpoint()File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 105, in load_model_from_checkpointmodel_states = torch.load(self.config.model.model_load_path, map_location='cpu')File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 771, in loadwith _open_file_like(f, 'rb') as opened_file:File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 270, in _open_file_likereturn _open_file(name_or_buffer, mode)File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 251, ininitsuper(_open_file, self).init(open(name, mode))FileNotFoundError: [Errno 2] No such file or directory: 'path/to/model_ckpt'
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
I suggest you to use LBBDM(latent space) not BBDM(pixel space) or you can decrease batch size and model channels
---- Replied Message ---- | From | @.> | | Date | 08/02/2023 11:57 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [xuekt98/BBDM] Error for model_load_path. (Issue #10) |
Thank you very much for replying to my question amidst your busy schedule!I tried the method you mentioned. During the training process, I used Python 3 main. py -- config/ Configs/Template BBDM. yaml -- train -- sample At Start -- save Top -- gpu IDS 0 was trained, but there has been a constant issue of cuda overflow. I tried to train with larger graphics memory, but the same problem still exists. Can you give me some suggestions to solve the current dilemma!(The specific error message is shown in the picture)
9/5000
太空人? @.***
------------------ 原始邮件 ------------------ 发件人: "xuekt98/BBDM" @.>; 发送时间: 2023年8月2日(星期三) 中午11:35 @.>; @.**@.>; 主题: Re: [xuekt98/BBDM] Error for model_load_path. (Issue #10)
THank you for your attention. If you need to train a new model, then you only need to leave the model_ckpt and optim_ckpt as default.
---- Replied Message ---- | From | @.> | | Date | 08/02/2023 11:28 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [xuekt98/BBDM] Error for model_load_path. (Issue #10) |
I also met the same problem. i want to train my datasets.but how to set the vaule of the two parameters,resume_model and resume_optim .Could you give me some advice to solve it? thank you !
Some errors appeared as follows:
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.warnings.warn(/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum orNonefor 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passingweights=AlexNet_Weights.IMAGENET1K_V1. You can also useweights=AlexNet_Weights.DEFAULTto get the most up-to-date weights.warnings.warn(msg)Loading model from: /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pthcreate output path results/NonCA/LBBDM-f4/Working with z of shape (1, 3, 64, 64) = 12288 dimensions.Restored from model/VQGAN/model.ckptload vqgan from model/VQGAN/model.ckptget parameters to optimize: UNetTotal Number of parameter: 292.41MTrainable Number of parameter: 237.09Mload model LBBDM-f4 from path/to/model_ckptTraceback (most recent call last):File "/home/yunfei/Projects/BBDM-main/main.py", line 130, inmain()File "/home/yunfei/Projects/BBDM-main/main.py", line 125, in mainCPU_singleGPU_launcher(nconfig)File "/home/yunfei/Projects/BBDM-main/main.py", line 91, in CPU_singleGPU_launcherrunner = get_runner(config.runner, config)File "/home/yunfei/Projects/BBDM-main/utils.py", line 45, in get_runnerrunner = Registers.runnersrunner_nameFile "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 19, ininitsuper().init(config)File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/DiffusionBaseRunner.py", line 11, ininitsuper().init(config)File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 68, ininitself.load_model_from_checkpoint()File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 37, in load_model_from_checkpointstates = super().load_model_from_checkpoint()File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 105, in load_model_from_checkpointmodel_states = torch.load(self.config.model.model_load_path, map_location='cpu')File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 771, in loadwith _open_file_like(f, 'rb') as opened_file:File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 270, in _open_file_likereturn _open_file(name_or_buffer, mode)File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 251, ininitsuper(_open_file, self).init(open(name, mode))FileNotFoundError: [Errno 2] No such file or directory: 'path/to/model_ckpt'
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
出现了一些错误如下:
设置[LPIPS]感知损失:trunk [alex],v[0.1],spatial [off] /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py :208: UserWarning: 参数“pretrained”自 0.13 起已弃用,并且将来可能会被删除,请改用“weights”。 warnings.warn( /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning:不推荐 使用权重枚举或“权重”以外的参数
None
从0.13开始,将来可能会被删除。当前的行为相当于通过weights=AlexNet_Weights.IMAGENET1K_V1
。您也可以使用weights=AlexNet_Weights.DEFAULT
来获取最新的权重。 warnings.warn(msg) 从以下位置加载模型: /home/yunfei/anaconda3/ envs/BBDM/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth 创建输出路径 results/NonCA/LBBDM-f4/ 使用形状的 z (1, 3, 64, 64) = 12288 维度。 从 model/VQGAN/model.ckpt 恢复 从 model/VQGAN/model.ckpt 加载 vqgan 获取要优化的参数:UNet 参数总数:292.41M 可训练 参数数量:237.09M 从路径加载模型 LBBDM-f4 /to/model_ckpt 回溯(最近一次调用最后): 文件“/home/yunfei/Projects/BBDM-main/main.py”,第 130 行,在 main() 文件“/home/yunfei/Projects/BBDM-main/ main.py”,第 125 行,在主 CPU_singleGPU_launcher(nconfig) 文件“/home/yunfei/Projects/BBDM-main/main.py”,第 91 行,在 CPU_singleGPU_launcher runner = get_runner(config.runner, config) 文件“/ home/yunfei/Projects/BBDM-main/utils.py”,第 45 行,在 get_runner runner = Registers.runners runner_name 文件“/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py”,第 19 行,在init super() 中。init (config) 文件“/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/DiffusionBaseRunner.py”,第 11 行,init super() 中。init (config) 文件“/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py”,第 68 行,init self.load_model_from_checkpoint() 文件“/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners /BBDMRunner.py”,第 37 行,在 load_model_from_checkpoint states = super().load_model_from_checkpoint() 文件“/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py”,第 105 行,在 load_model_from_checkpoint model_states = torch.load (self.config.model.model_load_path,map_location='cpu') 文件“/home/yunfei/anaconda3/envs/BBDM/lib/python3. 以 _open_file_like(f, 'rb') 作为打开的文件: 文件“/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py”,第 270 行,在 _open_file_like return _open_file( name_or_buffer, mode) 文件“/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py”,第 251 行,在init super(_open_file, self) 中。init (open(name, mode)) FileNotFoundError: [Errno 2] 没有这样的文件或目录: 'path/to/model_ckpt'
I have the same question. These warnings seriously affect the speed of training. Have you solved this problem?
Some errors appeared as follows:
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off] /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or
main()
File "/home/yunfei/Projects/BBDM-main/main.py", line 125, in main
CPU_singleGPU_launcher(nconfig)
File "/home/yunfei/Projects/BBDM-main/main.py", line 91, in CPU_singleGPU_launcher
runner = get_runner(config.runner, config)
File "/home/yunfei/Projects/BBDM-main/utils.py", line 45, in get_runner
runner = Registers.runnersrunner_name
File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 19, in init
super().init(config)
File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/DiffusionBaseRunner.py", line 11, in init
super().init(config)
File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 68, in init
self.load_model_from_checkpoint()
File "/home/yunfei/Projects/BBDM-main/runners/DiffusionBasedModelRunners/BBDMRunner.py", line 37, in load_model_from_checkpoint
states = super().load_model_from_checkpoint()
File "/home/yunfei/Projects/BBDM-main/runners/BaseRunner.py", line 105, in load_model_from_checkpoint
model_states = torch.load(self.config.model.model_load_path, map_location='cpu')
File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 771, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 270, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/torch/serialization.py", line 251, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'path/to/model_ckpt'
None
for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passingweights=AlexNet_Weights.IMAGENET1K_V1
. You can also useweights=AlexNet_Weights.DEFAULT
to get the most up-to-date weights. warnings.warn(msg) Loading model from: /home/yunfei/anaconda3/envs/BBDM/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth create output path results/NonCA/LBBDM-f4/ Working with z of shape (1, 3, 64, 64) = 12288 dimensions. Restored from model/VQGAN/model.ckpt load vqgan from model/VQGAN/model.ckpt get parameters to optimize: UNet Total Number of parameter: 292.41M Trainable Number of parameter: 237.09M load model LBBDM-f4 from path/to/model_ckpt Traceback (most recent call last): File "/home/yunfei/Projects/BBDM-main/main.py", line 130, in