smy20011 / dreambooth-gui

MIT License
363 stars 19 forks source link

Everything runs fine, but results are non-existent. #16

Closed Grimig closed 2 years ago

Grimig commented 2 years ago

I've tried multiple runs and everything seems to run just fine, except the final model seems to have not received any training at all. Ive trained on Colab before using same settings and results have been good. The only suspicious thing i notice is im getting more it/s than I probably should be getting. around 2 it/s with my 3800.

I read a similar comment on a reddit thread about your program, he also had a 3800 from what I remember. Might just be coincidence.

smy20011 commented 2 years ago

I think you need to wait a bit after training finished. Do you mind wait 5 mins after the training bar disappeared?

Grimig commented 2 years ago

Got the "finished" popup screen, waited a solid 15 min, and no sign of training on the model.

attached a photo of the training in progress, incase something looks off. tried batch size 2, just because, but same result.

1

jtkelm2 commented 2 years ago

I have this same issue, right down to the suspiciously high it/s. No matter what training parameters I pass to the thing, the output model is identical to the base model. You can test this by running the same prompt on the same seed on both.

I'm on a 3060 12gb, though I'm starting to suspect this might be a universally experienced problem? I've looked around and not found anybody who's shared their results.

smy20011 commented 2 years ago

Interesting, do you have the full training command and log?

On Sat, Oct 22, 2022, 4:49 PM jtkelm2 @.***> wrote:

I have this same issue, right down to the suspiciously high it/s. No matter what training parameters I pass to the thing, the output model is identical to the base model. You can test this by running the same prompt on the same seed on both.

I'm on a 3060 12gb, though has anybody actually gotten this to work? I've looked around and not found anybody who's shared their results.

— Reply to this email directly, view it on GitHub https://github.com/smy20011/dreambooth-gui/issues/16#issuecomment-1287950499, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL4434HC3ZNQ6YT3OHUFPDWER4PDANCNFSM6AAAAAARL2ESZQ . You are receiving this because you commented.Message ID: @.***>

jtkelm2 commented 2 years ago

Interesting, do you have the full training command and log? On Sat, Oct 22, 2022, 4:49 PM jtkelm2 @.> wrote: I have this same issue, right down to the suspiciously high it/s. No matter what training parameters I pass to the thing, the output model is identical to the base model. You can test this by running the same prompt on the same seed on both. I'm on a 3060 12gb, though has anybody actually gotten this to work? I've looked around and not found anybody who's shared their results. — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL4434HC3ZNQ6YT3OHUFPDWER4PDANCNFSM6AAAAAARL2ESZQ . You are receiving this because you commented.Message ID: @.>

Unfortunately at this point I do not. If you instead want to give me a training command (which you are sure works on your end), I can confirm if the same thing still happens on my end.

Also, this may not be relevant, but it's hard to tell whether the gui is actually finding the images I want it to. I press "select folder" and it only boots me to the next screen, without verifying e.g. how many images it finds. If there's no training going on, just maybe it's because it's not finding the provided instance images?

smy20011 commented 2 years ago

Interesting, do you have the full training command and log? On Sat, Oct 22, 2022, 4:49 PM jtkelm2 @.**> wrote: I have this same issue, right down to the suspiciously high it/s. No matter what training parameters I pass to the thing, the output model is identical to the base model. You can test this by running the same prompt on the same seed on both. I'm on a 3060 12gb, though has anybody actually gotten this to work? I've looked around and not found anybody who's shared their results. — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL4434HC3ZNQ6YT3OHUFPDWER4PDANCNFSM6AAAAAARL2ESZQ . You are receiving this because you commented.Message ID: @.**>

Unfortunately at this point I do not. If you instead want to give me a training command (which you are sure works on your end), I can confirm if the same thing still happens on my end.

Also, this may not be relevant, but it's hard to tell whether the gui is actually finding the images I want it to. I press "select folder" and it only boots me to the next screen, without verifying e.g. how many images it finds. If there's no training going on, just maybe it's because it's not finding the provided instance images?

Wired, it should open a dialog and let you choose folder. If you have a discord, we can try to debug together.

jtkelm2 commented 2 years ago

Wired, it should open a dialog and let you choose folder. If you have a discord, we can try to debug together.

It does open a dialogue and let me choose the folder, but it's still not clear if it's (1) navigating to and accessing the images without (e.g. permission) errors, and subsequently, (2) whether I should press "select folder" when I'm inside the folder, or when I'm in the parent directory while having the desired folder highlighted (or whether both methods work). Probably not the problem in this case, though, and it's just my UX suggestion.

Anyway yeah, add me at Paracompact#0249

Papricatia commented 2 years ago

I have the same problem. I run the program and everything seems to run just fine, except the final model seems to have not received any training at all ...

fredconex commented 2 years ago

So I'm also with same problem, I was wondering what could be wrong as in the colab it runs fine but on the gui it just shows no related results when using the "trained" model

SmezMorePrakezz commented 2 years ago

I also have the same problem. I'm sharing my settings and logs here.

Training Process: docker run --pull=always -t --gpus=all -v=C:\Users\SmezMorePrakezz\AppData\Roaming\smy20011.dreambooth\:/train -v=D:\Dreambooth_Training\SageInput:/instance -v=D:\Dreambooth_Training\SageOutput:/output -e HUGGING_FACE_HUB_TOKEN=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 -v=C:\Users\SmezMorePrakezz\AppData\Roaming\smy20011.dreambooth\loli:/class smy20011/dreambooth:latest /start_training /train_dreambooth.py --pretrained_model_name_or_path=hakurei/waifu-diffusion --instance_prompt=sageai loli --instance_data_dir=/instance --class_data_dir=/class --with_prior_preservation --prior_loss_weight=1.0 --class_prompt=loli --output_dir=/output --resolution=512 --max_train_steps=500 --learning_rate=5e-6 --lr_scheduler=constant --lr_warmup_steps=0 --mixed_precision=fp16 --train_batch_size=1 --gradient_accumulation_steps=1 --use_8bit_adam

Training output (shortening redundant line as ...)

latest: Pulling from smy20011/dreambooth
Digest: sha256:4cf73ab423b42eb0692d3f4ecf7e4729f31d2bcdff9fa283825cbee6de3d015f
Status: Image is up to date for smy20011/dreambooth:latest

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
/opt/conda/lib/python3.7/site-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
  "WARNING: The following directories listed in your path were found to "
/opt/conda/lib/python3.7/site-packages/bitsandbytes/cuda_setup/paths.py:99: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
  f'{candidate_env_vars["LD_LIBRARY_PATH"]} did not contain '
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /opt/conda/lib/python3.7/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...

Caching latents:   0%|          | 0/100 [00:00<?, ?it/s]
Caching latents:   1%|          | 1/100 [00:06<11:20,  6.87s/it]
Caching latents:   2%|▏         | 2/100 [00:07<04:49,  2.96s/it]
...
...
...
Caching latents: 100%|██████████| 100/100 [00:25<00:00,  3.92it/s]

  0%|          | 0/500 [00:00<?, ?it/s]
Steps:   0%|          | 0/500 [00:00<?, ?it/s]
Steps:   0%|          | 0/500 [00:01<?, ?it/s, loss=0.459, lr=5e-6]
Steps:   0%|          | 1/500 [00:01<13:25,  1.61s/it, loss=0.459, lr=5e-6]
...
...
...
Steps: 100%|█████████▉| 498/500 [06:05<00:01,  1.41it/s, loss=0.268, lr=5e-6]
Steps: 100%|█████████▉| 499/500 [06:06<00:00,  1.42it/s, loss=0.268, lr=5e-6]
Steps: 100%|██████████| 500/500 [06:06<00:00,  1.42it/s, loss=0.268, lr=5e-6]

Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]

Fetching 15 files:  93%|█████████▎| 14/15 [00:00<00:00, 135.61it/s]
Fetching 15 files: 100%|██████████| 15/15 [00:00<00:00, 135.79it/s]
The config attributes {'feature_extractor': ['transformers', 'CLIPFeatureExtractor'], 'safety_checker': ['stable_diffusion', 'StableDiffusionSafetyChecker']} were passed to StableDiffusionPipeline, but are not expected and will be ignored. Please verify your model_index.json configuration file.

Steps: 100%|██████████| 500/500 [06:46<00:00,  1.23it/s, loss=0.268, lr=5e-6]

Windows 11 64bit CPU: Intel Core i3 12100F RAM: Kingston Fury DDR4 3200 16GB (8x2) GPU: ZOTAC GAMING GEFORCE RTX 3060 TWIN EDGE OC - 12GB GDDR6 25 512x512px training images, all in PNGs.

smy20011 commented 2 years ago

Does it only happen when you train with class image?

On Mon, Oct 24, 2022 at 8:29 AM SmezMorePrakezz @.***> wrote:

I'm sharing my settings and logs here too.

Training Process: docker run --pull=always -t --gpus=all -v=C:\Users\SmezMorePrakezz\AppData\Roaming\smy20011.dreambooth\:/train -v=D:\Dreambooth_Training\SageInput:/instance -v=D:\Dreambooth_Training\SageOutput:/output -e HUGGING_FACE_HUB_TOKEN=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 -v=C:\Users\SmezMorePrakezz\AppData\Roaming\smy20011.dreambooth\loli:/class smy20011/dreambooth:latest /start_training /train_dreambooth.py --pretrained_model_name_or_path=hakurei/waifu-diffusion --instance_prompt=sageai loli --instance_data_dir=/instance --class_data_dir=/class --with_prior_preservation --prior_loss_weight=1.0 --class_prompt=loli --output_dir=/output --resolution=512 --max_train_steps=500 --learning_rate=5e-6 --lr_scheduler=constant --lr_warmup_steps=0 --mixed_precision=fp16 --train_batch_size=1 --gradient_accumulation_steps=1 --use_8bit_adam

Training output (shortening redundant line as ...)

latest: Pulling from smy20011/dreambooth

Digest: sha256:4cf73ab423b42eb0692d3f4ecf7e4729f31d2bcdff9fa283825cbee6de3d015f

Status: Image is up to date for smy20011/dreambooth:latest

===================================BUG REPORT===================================

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link

================================================================================

/opt/conda/lib/python3.7/site-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}

"WARNING: The following directories listed in your path were found to "

/opt/conda/lib/python3.7/site-packages/bitsandbytes/cuda_setup/paths.py:99: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...

f'{candidate_env_vars["LD_LIBRARY_PATH"]} did not contain '

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...

CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so

CUDA SETUP: Highest compute capability among GPUs detected: 8.6

CUDA SETUP: Detected CUDA version 113

CUDA SETUP: Loading binary /opt/conda/lib/python3.7/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...

Caching latents: 0%| | 0/100 [00:00<?, ?it/s]

Caching latents: 1%| | 1/100 [00:06<11:20, 6.87s/it]

Caching latents: 2%|▏ | 2/100 [00:07<04:49, 2.96s/it]

...

...

...

Caching latents: 100%|██████████| 100/100 [00:25<00:00, 3.92it/s]

0%| | 0/500 [00:00<?, ?it/s]

Steps: 0%| | 0/500 [00:00<?, ?it/s]

Steps: 0%| | 0/500 [00:01<?, ?it/s, loss=0.459, lr=5e-6]

Steps: 0%| | 1/500 [00:01<13:25, 1.61s/it, loss=0.459, lr=5e-6]

...

...

...

Steps: 100%|█████████▉| 498/500 [06:05<00:01, 1.41it/s, loss=0.268, lr=5e-6]

Steps: 100%|█████████▉| 499/500 [06:06<00:00, 1.42it/s, loss=0.268, lr=5e-6]

Steps: 100%|██████████| 500/500 [06:06<00:00, 1.42it/s, loss=0.268, lr=5e-6]

Fetching 15 files: 0%| | 0/15 [00:00<?, ?it/s]�[A

Fetching 15 files: 93%|█████████▎| 14/15 [00:00<00:00, 135.61it/s]�[A

Fetching 15 files: 100%|██████████| 15/15 [00:00<00:00, 135.79it/s]

The config attributes {'feature_extractor': ['transformers', 'CLIPFeatureExtractor'], 'safety_checker': ['stable_diffusion', 'StableDiffusionSafetyChecker']} were passed to StableDiffusionPipeline, but are not expected and will be ignored. Please verify your model_index.json configuration file.

Steps: 100%|██████████| 500/500 [06:46<00:00, 1.23it/s, loss=0.268, lr=5e-6]

Windows 11 64bit CPU: Intel Core i3 12100F RAM: Kingston Fury DDR4 3200 16GB (8x2) GPU: ZOTAC GAMING GEFORCE RTX 3060 TWIN EDGE OC - 12GB GDDR6 25 512x512px training images, all in PNGs.

— Reply to this email directly, view it on GitHub https://github.com/smy20011/dreambooth-gui/issues/16#issuecomment-1289213013, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL4432FSVPX6CNZZ5DSD5LWE2TOZANCNFSM6AAAAAARL2ESZQ . You are receiving this because you commented.Message ID: @.***>

-- Siyuan Ma Master student of Computer Science in Rice University

smy20011 commented 2 years ago

Also, do you mind run docker image ls in your powershell (with admin right) and paste the result here?

This will print out all the docker images and its size.

derekleighstark commented 2 years ago

I just ran this, and the conversion failed, Any clue as to why it may have failed to convert to a model checkpoint? I'm running again with convert unticked, I watched task manager and it utilized my GPU, 3060 RTX to 11.5 GB out of 12 GB, while it was doing the steps, but nothing in the output directories ever appeared. I have discord if you could help troubleshoot. Thruxus#0471

Hesounolen commented 2 years ago

Hello, I got the same issue. Everything goes fine but when I use the model with stable it doesn't seems to have received any training. Made multiples tests, 600 steps / 1000 steps / 2000 steps. with and without class. Thank you

fredconex commented 2 years ago

Just to add, I've also tried on windows 10 and 11, same issue, no training happening, I've also used the script to convert the to ckpt leaving the option on dreambooth gui disabled just to ensure it wasn't some issue with conversion but still same result.

SmezMorePrakezz commented 2 years ago
REPOSITORY            TAG       IMAGE ID       CREATED       SIZE
smy20011/dreambooth   latest    1468ae175524   2 days ago    14.9GB
smy20011/dreambooth   <none>    93688a23da13   2 weeks ago   14.7GB

Yes, every time it doesn't get trained, i try CompVis/stable-diffusion and hakurei/waifu-diffusion, try with and without class, try quoting instant prompt, and try to convert a diffuser model with its own official tool separately. no luck.

mocandragon commented 2 years ago

Just adding another user to this pile. Windows 10, RTX 3060. Everything runs, but the final model seems to show no difference, same as the folks above.

oatmealsoup commented 2 years ago

I am also having a similar issue. I am no programmer so this is a shot in the wild, but I am wondering if it has anything to do with the nsfw filter because it got mentioned when I first ran the program.

oatmealsoup commented 2 years ago

The config attributes {'feature_extractor': ['transformers', 'CLIPFeatureExtractor'], 'safety_checker': ['stable_diffusion', 'StableDiffusionSafetyChecker']} were passed to StableDiffusionPipeline, but are not expected and will be ignored. Please verify your model_index.json configuration file

Grimig commented 2 years ago

It now works for me. Im afraid I have no good answers as to what Ive done, however, when i reinstalled i saw that the default installation path was set at ".\Program Files\" thought maybe the empty space was messing things up. Installed at a different location, and voila. it worked. Cant think of anything different i did.

fredconex commented 2 years ago

It now works for me. Im afraid I have no good answers as to what Ive done, however, when i reinstalled i saw that the default installation path was set at ".\Program Files" thought maybe the empty space was messing things up. Installed at a different location, and voila. it worked. Cant think of anything different i did.

Will give that a try, will tell if that works for me soon.

Hesounolen commented 2 years ago

It now works for me. Im afraid I have no good answers as to what Ive done, however, when i reinstalled i saw that the default installation path was set at ".\Program Files" thought maybe the empty space was messing things up. Installed at a different location, and voila. it worked. Cant think of anything different i did.

Thank you for keeping us in touch, I tried to reinstall in a other location than the default progam files one and it didn't help unfortunately

fredconex commented 2 years ago

Yeah no luck, uninstalled, reinstalled on a different drive but still no training, I've even changed to "runwayml/stable-diffusion-v1-5" and its not learning a thing, it downloaded more files from what I remember but still hash is always [e02601f3].

fredconex commented 2 years ago

Well I give up, I've tried so many different things, I'm running as admin, different folder, included some extra parameters to see if I could get any changes but no matter what its always the same result, so its either not loading images for training or something else

smy20011 commented 2 years ago

Hey, I did a new pre-alpha release of dreambooth-gui. It uses a new xformer build that may support 3060. Do you mind try it? https://github.com/smy20011/dreambooth-gui/releases/tag/v0.1.6

Hesounolen commented 2 years ago

Hello smy20011, I've tested it and it worked with my 3070, thanks a lot

smy20011 commented 2 years ago

Hello smy20011, I've tested it and it worked with my 3070, thanks a lot

Let's goooooo. @fredconex Do you mind give it a try?

mocandragon commented 2 years ago

I would also like to say that the new version works for me as well! Thank you for your hard work, smy20011.

fredconex commented 2 years ago

Hello smy20011, I've tested it and it worked with my 3070, thanks a lot

Let's goooooo. @fredconex Do you mind give it a try?

I'm giving it a go, I did a train with 600 steps but didn't seem to have worked (not fully) a car I've trained showed some leaning to what I was training (but it was still miles away) now I'm training with 1800 steps, that's the same I did on colab and worked quite well, if after this it still not showing then might be still be broken.

I've uninstalled the old 0.15 and installed new one 0.16, I've even went to romaing folder and deleted the .hub stuff so I could download fresh files, right now its training, once I complete it I will get back to say if it worked, Thanks for your work in advance

Papricatia commented 2 years ago

thanks, it works for me too!

fredconex commented 2 years ago

Well, some of my messages seem to have disappeared?!

But it's working now, Thanks a lot @smy20011, the only request I would like to do is to be able to save current settings as default or to be able to modify it somewhere

SmezMorePrakezz commented 2 years ago

0.1.6 is working now on my RTX3060 12GB. Thank you very much.

also, adding --gradient_checkpointing makes learning faster that it turns s/it into it/s.

smy20011 commented 2 years ago

I think the new version solve this problem, closing this issue.