raise MisconfigurationException(f"No {loader_name}() method defined to run Trainer.{trainer_method}.")

euminds commented 1 year ago

I got this error:

Epoch 0: 80%|▊| 800/1001 [06:35<01:39, 2.03it/s, loss=0.0671, v_num=0, train/loss_simple_step=0.0197, train/loss_vlb_step=7.03e-5, train/loss_step=0Epoch 0, global step 799: val/loss_simple_ema was not in top 1 Average Epoch time: 395.59 seconds Average Peak memory 19447.18MiB Epoch 0: 80%|▊| 801/1001 [06:35<01:38, 2.02it/s, loss=0.0671, v_num=0, train/loss_simple_step=0.0197, train/loss_vlb_step=7.03e-5, train/loss_step=0 Saving latest checkpoint...

Traceback (most recent call last): File "main_id_embed.py", line 817, in trainer.test(model, data) File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 911, in test return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule) File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 954, in _test_impl results = self._run(model, ckpt_path=self.tested_ckpt_path) File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1128, in _run verify_loop_configurations(self) File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py", line 42, in verify_loop_configurations __verify_eval_loop_configuration(trainer, model, "test") File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py", line 186, in __verify_eval_loop_configuration raise MisconfigurationException(f"No {loader_name}() method defined to run Trainer.{trainer_method}.") pytorch_lightning.utilities.exceptions.MisconfigurationException: No test_dataloader() method defined to run Trainer.test.

env: I have configured the environment by following these steps:

I used the command conda env create -f environment.yaml to create the environment based on the specifications provided in the environment.yaml file.

Then, I activated the environment using the command conda activate celebbasis. By changing the environment name to "celebbasis," I ensured that I am working within this specific environment.

Regarding the dependencies mentioned:

The line # - -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers refers to the "taming-transformers" library. Due to network issues, I independently installed this dependency using the command pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers.

Similarly, the line # - -e git+https://github.com/openai/CLIP.git@main#egg=clip refers to the "CLIP" library. You installed this dependency separately as well, using the command pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip.

Next , About Face Alignment I align the images in Img, like id0(1).jpg,...,id0(10).jpg. I think this is a minor issue. Then, I run bash ./01_start_train.sh ./weights/sd-v1-4-full-ema.ckpt and meet the top error

Thanks

ygtxr1997 commented 1 year ago

I think the below command line output means the training progress is finished when setting max_steps: 800 in aigc_id.yaml

Epoch 0: 80%|▊| 801/1001 [06:35<01:38, 2.02it/s ...

euminds commented 1 year ago

I think the below command line output means the training progress is finished when setting max_steps: 800 in aigc_id.yaml
Epoch 0: 80%|▊| 801/1001 [06:35<01:38, 2.02it/s ...

Another error When I run the Generation bash ./02_start_test.sh ./weights/sd-v1-4-full-ema.ckpt ./infer_images/example_prompt.txt training2023-09-07T09-11-56_celebbasis Follow

Generation Edit the prompt file ./infer_images/example_prompt.txt, where sks denotes the first identity and ks denotes the second identity.

Optionally, in ./02_start_test.sh, you may modify the following var as you need:

step_list=(799) # the step of trained '.pt' files, e.g. (99 199 299 399) eval_id1_list=(0) # the ID index of the 1st person, e.g. (0 1 2 3 4) eval_id2_list=(1) # the ID index of the 2nd person, e.g. (0 1 2 3 4) Testing

bash ./02_start_test.sh "./weights/sd-v1-4-full-ema.ckpt" "./infer_images/example_prompt.txt" "traininYYYY-MM-DDTHH-MM-SS_celebbasis" The generated images are under ./outputs/traininYYYY-MM-DDTHH-MM-SS_celebbasis.

I got another error Traceback (most recent call last): File "scripts/stable_txt2img.py", line 385, in main() File "scripts/stable_txt2img.py", line 337, in main samplesddim, = sampler.sample(S=opt.ddim_steps, File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, kwargs) File "/data_heat/rjt_project/CelebBasis/ldm/models/diffusion/ddim.py", line 96, in sample samples, intermediates = self.ddim_sampling(conditioning, size, File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, *kwargs) File "/data_heat/rjt_project/CelebBasis/ldm/models/diffusion/ddim.py", line 149, in ddim_sampling outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps, File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(args, kwargs) File "/data_heat/rjt_project/CelebBasis/ldm/models/diffusion/ddim.py", line 177, in p_sample_ddim e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2) File "/data_heat/rjt_project/CelebBasis/ldm/models/diffusion/ddpm.py", line 1044, in apply_model x_recon = self.model(x_noisy, t, cond) File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/data_heat/rjt_project/CelebBasis/ldm/models/diffusion/ddpm.py", line 1545, in forward out = self.diffusion_model(x, t, context=cc) File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/data_heat/rjt_project/CelebBasis/ldm/modules/diffusionmodules/openaimodel.py", line 732, in forward h = module(h, emb, context) File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "/data_heat/rjt_project/CelebBasis/ldm/modules/diffusionmodules/openaimodel.py", line 85, in forward x = layer(x, context) File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/data_heat/rjt_project/CelebBasis/ldm/modules/attention.py", line 258, in forward x = block(x, context=context) File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/data_heat/rjt_project/CelebBasis/ldm/modules/attention.py", line 209, in forward return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint) File "/data_heat/rjt_project/CelebBasis/ldm/modules/diffusionmodules/util.py", line 116, in checkpoint return func(inputs) File "/data_heat/rjt_project/CelebBasis/ldm/modules/attention.py", line 212, in _forward x = self.attn1(self.norm1(x)) + x File "/home/user/miniconda3/envs/celebbasis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, **kwargs) File "/data_heat/rjt_project/CelebBasis/ldm/modules/attention.py", line 189, in forward attn = sim.softmax(dim=-1) RuntimeError: CUDA out of memory. Tried to allocate 8.00 GiB (GPU 0; 23.69 GiB total capacity; 8.60 GiB already allocated; 4.86 GiB free; 16.84 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ygtxr1997 commented 10 months ago

Did you free the GPU memory before running the generation code? Typically, your GPU with 24GB memory is sufficient for our code.

ygtxr1997 / CelebBasis

raise MisconfigurationException(f"No {loader_name}() method defined to run Trainer.{trainer_method}.") #13