timothybrooks / instruct-pix2pix

Other
6.09k stars 523 forks source link

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #131

Open kosmels opened 4 weeks ago

kosmels commented 4 weeks ago

Hello,

I am trying to train on custom dataset (where I have already prepared 1 - 1 image pairs and my seeds.json looks like this [["0000000", ["0"]], ["0000001", ["1"]], ... ) with 3x NVIDIA TITAN RTX 24GB. Initialization of all the models works fine but during validation sanity check I am getting this error:

...
[rank0]:   File "/code/instruct-pix2pix/./stable_diffusion/ldm/models/diffusion/ddpm_edit.py", line 892, in forward
[rank0]:     return self.p_losses(x, c, t, *args, **kwargs)
[rank0]:   File "/code/instruct-pix2pix/./stable_diffusion/ldm/models/diffusion/ddpm_edit.py", line 1043, in p_losses
[rank0]:     logvar_t = self.logvar[t].to(self.device)
[rank0]: RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

Do you know where it can come from? I did not change anything in source code. Just prepared the data and updated paths in train config.

Thanks in advance!

kosmels commented 4 weeks ago

UPDATE: Solved here https://github.com/CompVis/stable-diffusion/pull/851

After few steps of debugging I have found out that self.logvar has device==cpu (initialized here https://github.com/timothybrooks/instruct-pix2pix/blob/main/stable_diffusion/ldm/models/diffusion/ddpm_edit.py#L123) but t has device==cuda.

I made a small workaround and moved t to cpu during this indexing:

logvar_t = self.logvar[t.to(self.logvar.device)].to(self.device)

but I am not sure if this is ok. If yes, self.logvar should be somewhere moved to cuda, because it seems that during initialization self.device==cpu.

Another question of course is why am I getting this error at all? You did not have this type of issue during development?

Evangade commented 1 day ago

Same Problem, thank you very much!