Closed PanchengZhao closed 1 year ago
Hello, I almost forgot this bug. Thanks for pointing this out. It borders me a lot when I first try to finetune the model. The problem is due to the model that the official released.
Quick fix : You can --resume with the model that is saved during termination from error. (logs/checkpoints/last.ckpt)
I will modify the repo tomorrow.
Wow, what an amazing quick reply. Thank you so much for your reply, it works!
But I'm still interested in why this error occurs, and would appreciate it if you could let me know after modifying the repo!
KeyError: 'global_step' Have you solve the problem? @PanchengZhao
KeyError: 'global_step' Have you solve the problem? @PanchengZhao
Yes, just follow what @nickyisadog said to fix it. By the way, although I tried other options once I knew the problem, this was the only one that took effect and was the easiest.
@PanchengZhao Thanks for answering my question ! I successfully ran the code, but when I trained on the official Palace dataset, I found that the results seemed to get worse with training. I'd like to ask you about training related things, how many epochs did you train? and what was the loss value when you stopped training?
@Nonbiuld are you trying to train a model that generate a palace from any ordinary image?
@nickyisadog I'm sorry I wrote the wrong word, it's Places Dataset
@nickyisadog I directly do the step 3 "3. Finetune Latent diffusion model", but it get worse with training.
@Nonbiuld I did not attempt to train on the Places dataset, so no further advice on epoch and loss. However, my training process is working fine, and I suspect the problem may lie in the mask settings, and would suggest debugging the dataloader step-by-step to confirm that the mask is being read correctly and passed into the model.
@PanchengZhao Thanks for your advices!
@PanchengZhao I have another question to ask you. Are you using the mask generation tool provided by the author@nickyisadog ?
@Nonbiuld Actually, I did not use the mask generation tool provided by the author because the dataset I used had original masks.
@PanchengZhao @Nonbiuld Yes. It is better to use the original mask instead of box mask but for my dataset, only box mask works. You need to try them out.
@nickyisadog @PanchengZhao Thanks for the replies, I'll try all the relevant advice you guys give .
CUDA_VISIBLE_DEVICES=0 python main.py --base ldm/models/ldm/inpainting_big/config.yaml --resume logs/checkpoints/last.ckpt --stage 1 -t --gpus 0,
@nickyisadog when i run the above command I get the following error,
`TypeError: init() got an unexpected keyword argument 'embed_dim'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 785, in
CUDA_VISIBLE_DEVICES=0 python main.py --base ldm/models/ldm/inpainting_big/config.yaml --resume logs/checkpoints/last.ckpt --stage 1 -t --gpus 0,
@nickyisadog when i run the above command I get the following error,`TypeError: init() got an unexpected keyword argument 'embed_dim'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "main.py", line 785, in if trainer.global_rank == 0: NameError: name 'trainer' is not defined `
Do you have the original ldm folder from the original repo? Maybe try renaming it because sys.path may still refering to that folder
@nickyisadog thank you for your quick response.
# sys.path.append(os.getcwd() + "/ldm")
ldm_path = os.path.join(os.getcwd(), "ldm")
print('#################ldm_path:############',str(ldm_path))
sys.path.append(ldm_path)
output is #################ldm_path:############ /media/ihsan/nasdrive/Ihsan/workspace/latent-diffusion-inpainting/ldm #################ldm_path:############ /media/ihsan/nasdrive/Ihsan/workspace/latent-diffusion-inpainting/ldm which points to the correct ldm
regarding original ldm folder : I a using your code.
CUDA_VISIBLE_DEVICES=0 python main.py --base ldm/models/ldm/inpainting_big/config.yaml --resume logs/checkpoints/last.ckpt --stage 1 -t --gpus 0,
@nickyisadog when i run the above command I get the following error,`TypeError: init() got an unexpected keyword argument 'embed_dim'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "main.py", line 785, in if trainer.global_rank == 0: NameError: name 'trainer' is not defined `
hey, i meet the same error "TypeError: init() got an unexpected keyword argument 'embed_dim'", have you figure out why?
CUDA_VISIBLE_DEVICES=0 python main.py --base ldm/models/ldm/inpainting_big/config.yaml --resume logs/checkpoints/last.ckpt --stage 1 -t --gpus 0,
@nickyisadog when i run the above command I get the following error,TypeError: **init**() got an unexpected keyword argument 'embed_dim' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "main.py", line 785, in if trainer.global_rank == 0: NameError: name 'trainer' is not defined
hey, i meet the same error "TypeError: init() got an unexpected keyword argument 'embed_dim'", have you figure out why?
Hello @Polaris0421 , did you find the solution meanwhile?
CUDA_VISIBLE_DEVICES=0 python main.py --base ldm/models/ldm/inpainting_big/config.yaml --resume logs/checkpoints/last.ckpt --stage 1 -t --gpus 0,
@nickyisadog when i run the above command I get the following error,TypeError: **init**() got an unexpected keyword argument 'embed_dim' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "main.py", line 785, in if trainer.global_rank == 0: NameError: name 'trainer' is not defined
hey, i meet the same error "TypeError: init() got an unexpected keyword argument 'embed_dim'", have you figure out why?
Hello @Polaris0421 , did you find the solution meanwhile?
I might just find the parameter 'embed_dim' and deleted it. I'm not sure as it's been a while ago
Thanks for the training code!
When I try to finetune Latent diffusion model, some problems are occurred.
I modified the original ckpt by removing these four keys from them, it works.
When I try to continue training, a new error occurs: KeyError:'global_step'. For this, I have not found a solution. Do you know how to fix it?