Open toyxyz opened 1 month ago
thats so cool!
File "E:\CN\ctrlora\myenv\lib\site-packages\torch\distributed\distributed_c10d.py", line 886, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError: Distributed package doesn't have NCCL built in Have you ever had that problem?
Administrator@win-20231229PFV MINGW64 /e/CN/ctrlora
$ python scripts/train_ctrlora_finetune.py \
--dataroot ./data/shoes \
--config ./configs/ctrlora_finetune_sd15_rank128.yaml \
--sd_ckpt ./ckpts/sd15/v1-5-pruned.ckpt \
--cn_ckpt ./ckpts/ctrlora-basecn/ctrlora_sd15_basecn700k.ckpt \
--name shoes1 \
--max_steps 5000
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
E:\CN\ctrlora\myenv\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:118: UserWarning: You defined a validation_step
but have no val_dataloader
. Skipping val loop.
rank_zero_warn("You defined a validation_step
but have no val_dataloader
. Skipping val loop.")
E:\CN\ctrlora\myenv\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:280: LightningDeprecationWarning: Base LightningModule.on_train_batch_start
hook signature has changed in v1.5. The dataloader_idx
argument will be removed in v1.7.
rank_zero_deprecation(
initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:50388 (system error: 10049 - ▒▒▒▒▒▒▒▒▒▒▒У▒▒▒▒▒▒▒ĵ▒ַ▒▒Ч▒▒).
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:50388 (system error: 10049 - ▒▒▒▒▒▒▒▒▒▒▒У▒▒▒▒▒▒▒ĵ▒ַ▒▒Ч▒▒).
logging improved.
Dataset size: 1163
Number of devices: 1
Batch size per device: 1
Gradient accumulation: 1
Total batch size: 1
No module 'xformers'. Proceeding without it.
ControlFinetuneLDM: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loaded model config from [./configs/ctrlora_finetune_sd15_rank128.yaml]
Loaded state_dict from [./ckpts/sd15/v1-5-pruned.ckpt]
Loaded state_dict from [./ckpts/ctrlora-basecn/ctrlora_sd15_basecn700k.ckpt]
Successfully initialize SD from ./ckpts/sd15/v1-5-pruned.ckpt
Successfully initialize ControlNet from ./ckpts/ctrlora-basecn/ctrlora_sd15_basecn700k.ckpt
Traceback (most recent call last):
File "E:\CN\ctrlora\scripts\train_ctrlora_finetune.py", line 129, in
File "E:\CN\ctrlora\myenv\lib\site-packages\torch\distributed\distributed_c10d.py", line 886, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError: Distributed package doesn't have NCCL built in Have you ever had that problem?
I solved the problem with this method. https://github.com/xyfJASON/ctrlora/issues/8
File "E:\CN\ctrlora\myenv\lib\site-packages\torch\distributed\distributed_c10d.py", line 886, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError: Distributed package doesn't have NCCL built in Have you ever had that problem?文件“E:\CN\ctrlora\myenv\lib\site-packages\torch\distributed\distributed_c10d.py”,第 886 行,在 _new_process_group_helper 中引发运行时错误(“分布式包没有 NCCL ”“内置”)运行时错误:分布式包没有内置 NCCL 您遇到过这个问题吗?
I solved the problem with this method.我用这个方法解决了这个问题。 #8
Thanks I figured out how to fix it, just need to delete the strategy='ddp', that's it!
What NVIDIA graphics card did you train on?
What NVIDIA graphics card did you train on?
I used rtx 4090
What NVIDIA graphics card did you train on?
I used rtx 4090
I'm 4080. I can't train.
I trained using 1000 hand-drawn images. It's a little fuzzy because the lines aren't consistent, but it works.
What NVIDIA graphics card did you train on?
I used rtx 4090
I'm 4080. I can't train.
VRAM usage is not that high, 16GB should work. Or reduce the batch size.
You've got a great training set,from china?I want to train models that recognise different product materials and then optimise the light and shadow of the product based on the materials
You've got a great training set,from china?I want to train models that recognise different product materials and then optimise the light and shadow of the product based on the materials
You can create datasets using texture images and 3D models that use those textures.
@toyxyz how to get 3d pose like this
@toyxyz how to get 3d pose like this
Rendered using blender.
I trained using 1000 hand-drawn images. It's a little fuzzy because the lines aren't consistent, but it works.
@toyxyz Hi, many thanks for your testing!!! By the way, how do you collect this kind of data?
I trained using 1000 hand-drawn images. It's a little fuzzy because the lines aren't consistent, but it works.
@toyxyz Hi, many thanks for your testing!!! By the way, how do you collect this kind of data?
I drew it myself, of course!
@toyxyz Great work!! Did you use all the default settings to train these? I've been trying myself with a dataset of 1000 rough lineart to art pairs and the following command but it does not seem to be learning anything....
python scripts/train_ctrlora_finetune.py \
--dataroot ./data/dataset \
--config ./configs/ctrlora_finetune_sd15_rank128.yaml \
--sd_ckpt ./ckpts/sd15/v1-5-pruned.ckpt \
--cn_ckpt ./ckpts/control_sd15_init.pth \
--name lineart_custom \
--bs 1 \
--img_logger_freq 2000 \
--ckpt_logger_freq 2000
@toyxyz Great work!! Did you use all the default settings to train these? I've been trying myself with a dataset of 1000 rough lineart to art pairs and the following command but it does not seem to be learning anything....
python scripts/train_ctrlora_finetune.py \ --dataroot ./data/dataset \ --config ./configs/ctrlora_finetune_sd15_rank128.yaml \ --sd_ckpt ./ckpts/sd15/v1-5-pruned.ckpt \ --cn_ckpt ./ckpts/control_sd15_init.pth \ --name lineart_custom \ --bs 1 \ --img_logger_freq 2000 \ --ckpt_logger_freq 2000
@ccnguyen you need to train your LoRA with a well-trained Base ControlNet (ctrlora_sd15_basecn700k.ckpt), like
python scripts/train_ctrlora_finetune.py \
--dataroot ./data/dataset \
--config ./configs/ctrlora_finetune_sd15_rank128.yaml \
--sd_ckpt ./ckpts/sd15/v1-5-pruned.ckpt \
--cn_ckpt ./ckpts/ctrlora-basecn/ctrlora_sd15_basecn700k.ckpt \
--max_steps 1000
I prepared 1000 images and ran a training test. I set the max steps to 1000 and the training finished in 6 minutes, but the result is very cool!
Do you have any tips for running training?