KeyError 'global_step' - Githubissues

nickyisadog / latent-diffusion-inpainting

86 stars 16 forks source link

KeyError 'global_step' #1

Closed PanchengZhao closed 1 year ago

PanchengZhao commented 1 year ago

Thanks for the training code!

When I try to finetune Latent diffusion model, some problems are occurred.

Error(s) in loading state_dict for LatentDiffusion:Unexpected key(s) in state_dict: "ddim_sigmas", "ddim_alphas", "ddim_alphas_prev", "ddim_sqrt_one_minus_alphas".

I modified the original ckpt by removing these four keys from them, it works.

/anaconda3/envs/sd/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 190, in restore_loops self.trainer.fit_loop.global_step = self._loaded_checkpoint["global_step"] KeyError: 'global_step'

When I try to continue training, a new error occurs: KeyError:'global_step'. For this, I have not found a solution. Do you know how to fix it?

nickyisadog commented 1 year ago

Hello, I almost forgot this bug. Thanks for pointing this out. It borders me a lot when I first try to finetune the model. The problem is due to the model that the official released.

Quick fix : You can --resume with the model that is saved during termination from error. (logs/checkpoints/last.ckpt)

I will modify the repo tomorrow.

PanchengZhao commented 1 year ago

Wow, what an amazing quick reply. Thank you so much for your reply, it works!

But I'm still interested in why this error occurs, and would appreciate it if you could let me know after modifying the repo!

Nonbiuld commented 1 year ago

KeyError: 'global_step' Have you solve the problem？ @PanchengZhao

PanchengZhao commented 1 year ago

KeyError: 'global_step' Have you solve the problem？ @PanchengZhao

Yes, just follow what @nickyisadog said to fix it. By the way, although I tried other options once I knew the problem, this was the only one that took effect and was the easiest.

Nonbiuld commented 12 months ago

@PanchengZhao Thanks for answering my question ! I successfully ran the code, but when I trained on the official Palace dataset, I found that the results seemed to get worse with training. I'd like to ask you about training related things, how many epochs did you train? and what was the loss value when you stopped training?

nickyisadog commented 12 months ago

@Nonbiuld are you trying to train a model that generate a palace from any ordinary image?

Nonbiuld commented 12 months ago

@nickyisadog I'm sorry I wrote the wrong word, it's Places Dataset

Nonbiuld commented 12 months ago

@nickyisadog I directly do the step 3 "3. Finetune Latent diffusion model", but it get worse with training.

PanchengZhao commented 12 months ago

@Nonbiuld I did not attempt to train on the Places dataset, so no further advice on epoch and loss. However, my training process is working fine, and I suspect the problem may lie in the mask settings, and would suggest debugging the dataloader step-by-step to confirm that the mask is being read correctly and passed into the model.

Nonbiuld commented 12 months ago

@PanchengZhao Thanks for your advices!

Nonbiuld commented 12 months ago

@PanchengZhao I have another question to ask you. Are you using the mask generation tool provided by the author@nickyisadog ?

PanchengZhao commented 12 months ago

@Nonbiuld Actually, I did not use the mask generation tool provided by the author because the dataset I used had original masks.

nickyisadog commented 12 months ago

@PanchengZhao @Nonbiuld Yes. It is better to use the original mask instead of box mask but for my dataset, only box mask works. You need to try them out.

Nonbiuld commented 12 months ago

@nickyisadog @PanchengZhao Thanks for the replies, I'll try all the relevant advice you guys give .

Ihsan149 commented 10 months ago

CUDA_VISIBLE_DEVICES=0 python main.py --base ldm/models/ldm/inpainting_big/config.yaml --resume logs/checkpoints/last.ckpt --stage 1 -t --gpus 0, @nickyisadog when i run the above command I get the following error,

`TypeError: init() got an unexpected keyword argument 'embed_dim'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 785, in if trainer.global_rank == 0: NameError: name 'trainer' is not defined `

nickyisadog commented 10 months ago

CUDA_VISIBLE_DEVICES=0 python main.py --base ldm/models/ldm/inpainting_big/config.yaml --resume logs/checkpoints/last.ckpt --stage 1 -t --gpus 0, @nickyisadog when i run the above command I get the following error,

`TypeError: init() got an unexpected keyword argument 'embed_dim'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 785, in if trainer.global_rank == 0: NameError: name 'trainer' is not defined `

Do you have the original ldm folder from the original repo? Maybe try renaming it because sys.path may still refering to that folder

Ihsan149 commented 10 months ago

@nickyisadog thank you for your quick response.

# sys.path.append(os.getcwd() + "/ldm")
ldm_path = os.path.join(os.getcwd(), "ldm")
print('#################ldm_path:############',str(ldm_path))
sys.path.append(ldm_path)

output is #################ldm_path:############ /media/ihsan/nasdrive/Ihsan/workspace/latent-diffusion-inpainting/ldm #################ldm_path:############ /media/ihsan/nasdrive/Ihsan/workspace/latent-diffusion-inpainting/ldm which points to the correct ldm

regarding original ldm folder : I a using your code.

Polaris0421 commented 6 months ago

CUDA_VISIBLE_DEVICES=0 python main.py --base ldm/models/ldm/inpainting_big/config.yaml --resume logs/checkpoints/last.ckpt --stage 1 -t --gpus 0, @nickyisadog when i run the above command I get the following error,

`TypeError: init() got an unexpected keyword argument 'embed_dim'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 785, in if trainer.global_rank == 0: NameError: name 'trainer' is not defined `

hey, i meet the same error "TypeError: init() got an unexpected keyword argument 'embed_dim'", have you figure out why?

ultiwinter commented 2 months ago

CUDA_VISIBLE_DEVICES=0 python main.py --base ldm/models/ldm/inpainting_big/config.yaml --resume logs/checkpoints/last.ckpt --stage 1 -t --gpus 0, @nickyisadog when i run the above command I get the following error, TypeError: **init**() got an unexpected keyword argument 'embed_dim' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "main.py", line 785, in if trainer.global_rank == 0: NameError: name 'trainer' is not defined

hey, i meet the same error "TypeError: init() got an unexpected keyword argument 'embed_dim'", have you figure out why?

Hello @Polaris0421 , did you find the solution meanwhile?

Polaris0421 commented 2 months ago

CUDA_VISIBLE_DEVICES=0 python main.py --base ldm/models/ldm/inpainting_big/config.yaml --resume logs/checkpoints/last.ckpt --stage 1 -t --gpus 0, @nickyisadog when i run the above command I get the following error, TypeError: **init**() got an unexpected keyword argument 'embed_dim' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "main.py", line 785, in if trainer.global_rank == 0: NameError: name 'trainer' is not defined

hey, i meet the same error "TypeError: init() got an unexpected keyword argument 'embed_dim'", have you figure out why?

Hello @Polaris0421 , did you find the solution meanwhile?

I might just find the parameter 'embed_dim' and deleted it. I'm not sure as it's been a while ago