xyfJASON / ctrlora

Codebase for "CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation"
Apache License 2.0
142 stars 5 forks source link

Training test! #9

Open toyxyz opened 2 weeks ago

toyxyz commented 2 weeks ago

I prepared 1000 images and ran a training test. I set the max steps to 1000 and the training finished in 6 minutes, but the result is very cool!

Do you have any tips for running training?

image

image

cchance27 commented 2 weeks ago

thats so cool!

Jerry-155 commented 2 weeks ago

File "E:\CN\ctrlora\myenv\lib\site-packages\torch\distributed\distributed_c10d.py", line 886, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError: Distributed package doesn't have NCCL built in Have you ever had that problem?

Jerry-155 commented 2 weeks ago

Administrator@win-20231229PFV MINGW64 /e/CN/ctrlora $ python scripts/train_ctrlora_finetune.py \ --dataroot ./data/shoes \ --config ./configs/ctrlora_finetune_sd15_rank128.yaml \ --sd_ckpt ./ckpts/sd15/v1-5-pruned.ckpt \ --cn_ckpt ./ckpts/ctrlora-basecn/ctrlora_sd15_basecn700k.ckpt \ --name shoes1 \ --max_steps 5000 GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs E:\CN\ctrlora\myenv\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:118: UserWarning: You defined a validation_step but have no val_dataloader. Skipping val loop. rank_zero_warn("You defined a validation_step but have no val_dataloader. Skipping val loop.") E:\CN\ctrlora\myenv\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:280: LightningDeprecationWarning: Base LightningModule.on_train_batch_start hook signature has changed in v1.5. The dataloader_idx argument will be removed in v1.7. rank_zero_deprecation( initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1 [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:50388 (system error: 10049 - ▒▒▒▒▒▒▒▒▒▒▒У▒▒▒▒▒▒▒ĵ▒ַ▒▒Ч▒▒). [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:50388 (system error: 10049 - ▒▒▒▒▒▒▒▒▒▒▒У▒▒▒▒▒▒▒ĵ▒ַ▒▒Ч▒▒). logging improved. Dataset size: 1163 Number of devices: 1 Batch size per device: 1 Gradient accumulation: 1 Total batch size: 1 No module 'xformers'. Proceeding without it. ControlFinetuneLDM: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels Loaded model config from [./configs/ctrlora_finetune_sd15_rank128.yaml] Loaded state_dict from [./ckpts/sd15/v1-5-pruned.ckpt] Loaded state_dict from [./ckpts/ctrlora-basecn/ctrlora_sd15_basecn700k.ckpt] Successfully initialize SD from ./ckpts/sd15/v1-5-pruned.ckpt Successfully initialize ControlNet from ./ckpts/ctrlora-basecn/ctrlora_sd15_basecn700k.ckpt Traceback (most recent call last): File "E:\CN\ctrlora\scripts\train_ctrlora_finetune.py", line 129, in trainer.fit(model, dataloader) File "E:\CN\ctrlora\myenv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 735, in fit self._call_and_handle_interrupt( File "E:\CN\ctrlora\myenv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 682, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "E:\CN\ctrlora\myenv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 770, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "E:\CN\ctrlora\myenv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1131, in _run self.accelerator.setup_environment() File "E:\CN\ctrlora\myenv\lib\site-packages\pytorch_lightning\accelerators\gpu.py", line 39, in setup_environment super().setup_environment() File "E:\CN\ctrlora\myenv\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 83, in setup_environment self.training_type_plugin.setup_environment() File "E:\CN\ctrlora\myenv\lib\site-packages\pytorch_lightning\plugins\training_type\ddp.py", line 192, in setup_environment self.setup_distributed() File "E:\CN\ctrlora\myenv\lib\site-packages\pytorch_lightning\plugins\training_type\ddp.py", line 279, in setup_distributed init_dist_connection(self.cluster_environment, self.torch_distributed_backend) File "E:\CN\ctrlora\myenv\lib\site-packages\pytorch_lightning\utilities\distributed.py", line 386, in init_dist_connection torch.distributed.init_process_group( File "E:\CN\ctrlora\myenv\lib\site-packages\torch\distributed\distributed_c10d.py", line 761, in init_process_group default_pg = _new_process_group_helper( File "E:\CN\ctrlora\myenv\lib\site-packages\torch\distributed\distributed_c10d.py", line 886, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError: Distributed package doesn't have NCCL built in

toyxyz commented 2 weeks ago

File "E:\CN\ctrlora\myenv\lib\site-packages\torch\distributed\distributed_c10d.py", line 886, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError: Distributed package doesn't have NCCL built in Have you ever had that problem?

I solved the problem with this method. https://github.com/xyfJASON/ctrlora/issues/8

Jerry-155 commented 2 weeks ago

File "E:\CN\ctrlora\myenv\lib\site-packages\torch\distributed\distributed_c10d.py", line 886, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError: Distributed package doesn't have NCCL built in Have you ever had that problem?文件“E:\CN\ctrlora\myenv\lib\site-packages\torch\distributed\distributed_c10d.py”,第 886 行,在 _new_process_group_helper 中引发运行时错误(“分布式包没有 NCCL ”“内置”)运行时错误:分布式包没有内置 NCCL 您遇到过这个问题吗?

I solved the problem with this method.我用这个方法解决了这个问题。 #8

Thanks I figured out how to fix it, just need to delete the strategy='ddp', that's it!

Jerry-155 commented 2 weeks ago

What NVIDIA graphics card did you train on?

toyxyz commented 2 weeks ago

What NVIDIA graphics card did you train on?

I used rtx 4090

Jerry-155 commented 1 week ago

What NVIDIA graphics card did you train on?

I used rtx 4090

I'm 4080. I can't train.

toyxyz commented 1 week ago

I trained using 1000 hand-drawn images. It's a little fuzzy because the lines aren't consistent, but it works.

image

toyxyz commented 1 week ago

What NVIDIA graphics card did you train on?

I used rtx 4090

I'm 4080. I can't train.

VRAM usage is not that high, 16GB should work. Or reduce the batch size.

Jerry-155 commented 1 week ago

You've got a great training set,from china?I want to train models that recognise different product materials and then optimise the light and shadow of the product based on the materials

toyxyz commented 1 week ago

You've got a great training set,from china?I want to train models that recognise different product materials and then optimise the light and shadow of the product based on the materials

You can create datasets using texture images and 3D models that use those textures.

crapthings commented 1 week ago

@toyxyz how to get 3d pose like this

image
toyxyz commented 1 week ago

@toyxyz how to get 3d pose like this

image

Rendered using blender.

LynnHo commented 1 week ago

I trained using 1000 hand-drawn images. It's a little fuzzy because the lines aren't consistent, but it works.

image

@toyxyz Hi, many thanks for your testing!!! By the way, how do you collect this kind of data?

toyxyz commented 1 week ago

I trained using 1000 hand-drawn images. It's a little fuzzy because the lines aren't consistent, but it works. image

@toyxyz Hi, many thanks for your testing!!! By the way, how do you collect this kind of data?

I drew it myself, of course!