ubc-vision / Prompting-Hard-Hardly-Prompting

Apache License 2.0
11 stars 1 forks source link

[Bug]: Fail to execute main_textual_inversion.py #1

Closed pzs19 closed 1 month ago

pzs19 commented 1 month ago

Describe the bug

First, Thanks for the great work! I encountered a error when execute main_textual_inversion.py:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

I have tried to specify gpus in inversion_config.yaml, but it led to another error:

ValueError: bad value(s) in fds_to_keep

Steps to reproduce

  1. Download embedding_matrix.pt to ./embedding_matrix.pt
  2. Download model.ckpt to models/ldm/stable-diffusion-v1/model.ckpt
  3. Run main_textual_inversion.py by
    python main_textual_inversion.py --base configs/latent-diffusion/inversion_config.yaml --train True

Logs

Traceback (most recent call last):
  File "main_textual_inversion.py", line 792, in <module>
    trainer.fit(model, data)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 553, in fit
    self._run(model)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in _run
    self._dispatch()
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 986, in _dispatch
    self.accelerator.start_training(self)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
    self._results = trainer.run_stage()
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 996, in run_stage
    return self._run_train()
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1045, in _run_train
    self.fit_loop.run()
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 200, in advance
    epoch_output = self.epoch_loop.run(train_dataloader)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 130, in advance
    batch_output = self.batch_loop.run(batch, self.iteration_count, self._dataloader_idx)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 101, in run
    super().run(batch, batch_idx, dataloader_idx)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 148, in advance
    result = self._run_optimization(batch_idx, split_batch, opt_idx, optimizer)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 202, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 396, in _optimizer_step
    model_ref.optimizer_step(
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1618, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 209, in step
    self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 129, in __optimizer_step
    trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 296, in optimizer_step
    self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 303, in run_optimizer_step
    self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 226, in optimizer_step
    optimizer.step(closure=lambda_closure, **kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/torch/optim/lbfgs.py", line 311, in step
    orig_loss = closure()
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 236, in _training_step_and_backward_closure
    result = self.training_step_and_backward(split_batch, batch_idx, opt_idx, optimizer, hiddens)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 549, in training_step_and_backward
    self.backward(result, optimizer, opt_idx)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 590, in backward
    result.closure_loss = self.trainer.accelerator.backward(result.closure_loss, optimizer, *args, **kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 278, in backward
    closure_loss = self.precision_plugin.post_backward(self.lightning_module, closure_loss)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 91, in post_backward
    model.trainer.call_hook("on_after_backward")
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1224, in call_hook
    output = hook_fx(*args, **kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/qianhuiwu/zspan/DeComp/baselines/Prompting-Hard-Hardly-Prompting/ldm/models/diffusion/ddpm.py", line 1506, in on_after_backward
    self.cond_stage_model.transformer.text_model.embeddings.textual_embedding.textual_inv_embedding = self.cond_stage_model.transformer.text_model.embeddings.textual_embedding.textual_inv_embedding - self.optimizers().param_groups[0]['lr']*self.cond_stage_model.transformer.text_model.embeddings.textual_embedding.projected_textual_inv_embedding.grad
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

after specify gpus in inversion_config.yaml:

....
lightning:
  trainer:
    gpus: "1,2,3"
    benchmark: True
    max_steps: 500
    accumulate_grad_batches: 3

Another error occurs:

Traceback (most recent call last):
  File "main_textual_inversion.py", line 792, in <module>
    try:
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 553, in fit
    self._run(model)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in _run
    self._dispatch()
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 986, in _dispatch
    self.accelerator.start_training(self)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 158, in start_training
    mp.spawn(self.new_process, **self.mp_spawn_kwargs)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 189, in start_processes
    process.start()
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 58, in _launch
    self.pid = util.spawnv_passfds(spawn.get_executable(),
  File "/home/qianhuiwu/anaconda3/envs/sd_zspan/lib/python3.8/multiprocessing/util.py", line 452, in spawnv_passfds
    return _posixsubprocess.fork_exec(
ValueError: bad value(s) in fds_to_keep

Additional Information

Python=3.8.5

s-mahajan commented 1 month ago

Hello, Thanks for your interest in our work! You need to specify a single GPU for running the code. Hope this helps!