Closed mkabatek closed 1 year ago
Hi, did you ever resolve this? If not, can you paste from the start of the SD thread kickoff to the end of the error (e.g. look for the line that starts with [cuda:x] >> starting job #zzzz: python.... and paste everything that follows that to the end of the error trace).
Are the images from the first 2 GPUs being created properly? If none are, is this potentially a permission issue (make sure the Dream Factory/SD scripts have permission to create directories)?
@rbbrdckybk thanks for your reply. I'm trying to narrow down what the issue is. I have upgraded the CPU on the computer to an i7 6600, with 8 threads, because I thought the processor might be under powered, however the same issues persists.
Here is what I have done to debug - I have 6 GPUs on this machine, so I have set export CUDA_VISIBLE_DEVICES=0,1
to just have two of them enabled to see if that works, however only one of them will generate images, the other one will be stuck on +exif data
.
If I set export CUDA_VISIBLE_DEVICES=0
- to only use the 0th CUDA device the program runs fine. I don't see any other errors in the stack trace.
[cuda:0] >>> starting job #1: python scripts_mod/optimized_txt2img.py --turbo --skip_grid --n_iter 2 --n_samples 1 --prompt "a cute robot, art by Aleksi Briclot, ethereal" --ddim_steps 50 --scale 7.5 --W 512 --H 512 --sampler plms --seed 394308370 --outdir "../output/2022-12-03-example-standard"
[cuda:1] >>> starting job #1: python scripts_mod/optimized_txt2img.py --turbo --device "cuda:1" --skip_grid --n_iter 2 --n_samples 1 --prompt "a cute robot, art by Aleksi Briclot, full of details" --ddim_steps 50 --scale 7.5 --W 512 --H 512 --sampler plms --seed 3594534163 --outdir "../output/2022-12-03-example-standard"
Global seed set to 394308370
Global seed set to 3594534163
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Global Step: 470000
Global Step: 470000
/home/lol/anaconda3/envs/dream-factory/lib/python3.9/site-packages/pytorch_lightning/utilities/distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v1.10.0. You can import it from `pytorch_lightning.utilities` instead.
rank_zero_deprecation(
/home/lol/anaconda3/envs/dream-factory/lib/python3.9/site-packages/pytorch_lightning/utilities/distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v1.10.0. You can import it from `pytorch_lightning.utilities` instead.
rank_zero_deprecation(
UNet: Running in eps-prediction mode
UNet: Running in eps-prediction mode
Exception in thread Thread-14:
Traceback (most recent call last):
File "/home/lol/anaconda3/envs/dream-factory/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/lol/dream-factory/dream-factory.py", line 143, in run
new_files = os.listdir(samples_dir)
FileNotFoundError: [Errno 2] No such file or directory: 'output/2022-12-03-example-standard/gpu_1'```
It looks like the model is failing to load and the SD process is silently failing, which then kicks back to Dream Factory, which expects the gpu output folder to have been created by SD (resulting in the FileNotFoundError that you're seeing).
If you're comfortable editing dream-factory/stable-diffusion/scripts_mod/optimized_txt2img.py, you can add a couple simple print statements before & after line #200 (e.g.: directly before/after sd = load_model_from_config(f"{opt.ckpt}") ) so we can either confirm or rule this out. Something like this:
print('loading model now...')
sd = load_model_from_config(f"{opt.ckpt}")
print('finished loading model!')
Then try to re-launch Dream Factory. I'd expect that you'll see the 'loading model now...' output, but SD will silently fail before the 'finished loading model!' text appears (at least on gpu(s) that aren't working).
If that's the case, I'm unfortunately not sure why PyTorch is failing to load the model beyond the first GPU, and am equally clueless on how to resolve it. I suspect it's a system RAM issue after Googling a bit, but you have 20GB, which should be enough (unless you're allocating a large portion to something else?). You can try creating a large swap file to see if that resolves the error, but I don't recommend running that way long-term (it'll be very slow and will wear your SSD out pretty quickly).
Thanks again for your feedback. I added the code to the following files
dream-factory/stable-diffusion/scripts_mod/optimized_txt2img.py
I can see that both the models load properly - so it didn't fail on model loading. Here is the output.
I also only enabled 2 GPUs to narrow the scope for easier debugging.
(dream-factory) lol@lol-H110-D3A:~/dream-factory$ python dream-factory.py
[controller] >>> reading configuration from config.txt...
[controller] >>> starting webserver (http://localhost:8080/) as a background process...
[controller] >>> detected 2 total GPU device(s)...
[controller] >>> initialized worker 'cuda:0': NVIDIA GeForce RTX 3070
[controller] >>> initialized worker 'cuda:1': NVIDIA GeForce RTX 3060 Ti
[controller] >>> No more work in queue; waiting for all workers to finish...
[controller] >>> All work done; pausing server - add some more work via the control panel!
[controller] >>> clearing work queue...
[controller] >>> queued 189 work items.
[controller] >>> Un-pausing; workers will resume working...
[cuda:0] >>> starting job #1: python scripts_mod/optimized_txt2img.py --turbo --skip_grid --n_iter 2 --n_samples 1 --prompt "a cute robot, art by Aleksi Briclot, ethereal" --ddim_steps 50 --scale 7.5 --W 512 --H 512 --sampler plms --seed 2178210710 --outdir "../output/2022-12-05-example-standard"
[cuda:1] >>> starting job #1: python scripts_mod/optimized_txt2img.py --turbo --device "cuda:1" --skip_grid --n_iter 2 --n_samples 1 --prompt "a cute robot, art by Aleksi Briclot, full of details" --ddim_steps 50 --scale 7.5 --W 512 --H 512 --sampler plms --seed 2592175028 --outdir "../output/2022-12-05-example-standard"
Global seed set to 2592175028
Global seed set to 2178210710
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Global Step: 470000
Global Step: 470000
/home/lol/anaconda3/envs/dream-factory/lib/python3.9/site-packages/pytorch_lightning/utilities/distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v1.10.0. You can import it from `pytorch_lightning.utilities` instead.
rank_zero_deprecation(
/home/lol/anaconda3/envs/dream-factory/lib/python3.9/site-packages/pytorch_lightning/utilities/distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v1.10.0. You can import it from `pytorch_lightning.utilities` instead.
rank_zero_deprecation(
UNet: Running in eps-prediction mode
UNet: Running in eps-prediction mode
Exception in thread Thread-14:
Traceback (most recent call last):
File "/home/lol/anaconda3/envs/dream-factory/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/lol/dream-factory/dream-factory.py", line 143, in run
new_files = os.listdir(samples_dir)
FileNotFoundError: [Errno 2] No such file or directory: 'output/2022-12-05-example-standard/gpu_1'
[controller] >>> Pause requested; workers will finish current work and then wait...
CondStage: Running in eps-prediction mode
FirstStage: Running in eps-prediction mode
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Using prompt: a cute robot, art by Aleksi Briclot, ethereal
Sampling: 0%| | 0/2 [00:00<?, ?it/sseeds used = [2178210710] | 0/1 [00:00<?, ?it/s]
Data shape for PLMS sampling is [1, 4, 64, 64]
Running PLMS Sampling with 50 timesteps
PLMS Sampler: 100%|█████████████████████████████| 50/50 [00:10<00:00, 4.73it/s]
torch.Size([1, 4, 64, 64])
saving images 100%|█████████████████████████████| 50/50 [00:10<00:00, 5.55it/s]
memory_final = 2.575872
data: 100%|███████████████████████████████████████| 1/1 [00:25<00:00, 25.05s/it]
Sampling: 50%|█████████████████▌ | 1/2 [00:25<00:25, 25.05s/itseeds used = [2178210711] | 0/1 [00:00<?, ?it/s]
Data shape for PLMS sampling is [1, 4, 64, 64]
Running PLMS Sampling with 50 timesteps
PLMS Sampler: 100%|█████████████████████████████| 50/50 [00:09<00:00, 5.48it/s]
torch.Size([1, 4, 64, 64])
saving images 100%|█████████████████████████████| 50/50 [00:09<00:00, 5.43it/s]
memory_final = 2.575872
data: 100%|███████████████████████████████████████| 1/1 [00:20<00:00, 20.97s/it]
Sampling: 100%|███████████████████████████████████| 2/2 [00:46<00:00, 23.01s/it]
Samples finished in 3.56 minutes and exported to ../output/2022-12-05-example-standard/gpu_0
Seeds used = 2178210710,2178210711
[cuda:0] >>> finished job #1 in 219.2 seconds.
However I did notice that the output directory passed into the script from the webUI, doesn't include the gpu_x
so the files should be generated to that main output directory with the date. Maybe this is a problem with that directory not being created in time to copy the files over the the specific gpu_x
directory.
Thanks again - hopefully we can narrow this problem down and get it working. I'm happy to code, debug, and work through any issues - so if you have any guidance that would be great.
Also FWIW - this machine has nothing else running on it - I am setting it up just to use stable diffusion/dream factory 1TB SSD, 20GB ram, 6x nvidia GPUs.
Did some investigation on my own Linux machine, and monitoring memory usage via top, it looks like SD uses almost 10GB of RAM when loading the model (uses system RAM first before being loaded into each GPU's VRAM). So knowing that, it makes sense that only 2 of your GPUs aren't producing errors - with 20GB of system RAM, SD is silently failing on the model load after maxing out your 20GB of memory loading the first two models.
I never noticed the high system memory usage before and the SD forks that I'm using aren't designed to minimize it (obviously loading the same model into memory more than once doesn't make sense; I'd need to re-write to load it into memory once and then use that single copy to load each GPU - or at minimum, stagger the loading at startup so that all GPUs aren't trying to perform the load at once).
I'm actually in the process of completely replacing the backend of Dream Factory with a more up-to-date repo, rather than try to cobble updates onto the unsupported ones that I'm currently using. It'll probably take me another day or two to get something useable into github but I'll reply here when it's ready for testing. Given that you have so many GPUs it'd be great to have you test it when it's ready!
Awesome just lmk, I'm happy to test any time - I can also try increase the swap space super high, and see if it can run.
What I've found is that the error happens in this code below, which from what I understand is the saving of metadata, and moving the files. But yes if it is loading the model 6x times into system RAM that would make sense. If I comment the code below, the system locks up completely but doesn't error out, if it works this would effectively just save the files to the main output directory.
Anyways thanks for your feedback if there is anything I can do to help let me know.
else:
new_files = os.listdir(samples_dir)
nf_count = 0
for f in new_files:
if (".png" in f):
# save just the essential prompt params to metadata
meta_prompt = command.split(" --prompt ",1)[1]
meta_prompt = meta_prompt.split(" --outdir ",1)[0]
if 'seed_' in f:
# grab seed from filename
actual_seed = f.replace('seed_', '')
actual_seed = actual_seed.split('_',1)[0]
# replace the seed in the command with the actual seed used
pleft = meta_prompt.split(" --seed ",1)[0]
pright = meta_prompt.split(" --seed ",1)[1].strip()
meta_prompt = pleft + " --seed " + actual_seed
upscale_text = ""
if self.command['use_upscale'] == 'yes':
upscale_text = " (upscaled "
upscale_text += str(self.command['upscale_amount']) + "x via "
if self.command['upscale_face_enh'] == 'yes':
upscale_text += "ESRGAN/GFPGAN)"
else:
upscale_text += "ESRGAN)"
pngImage = PngImageFile(samples_dir + "/" + f)
im = pngImage.convert('RGB')
exif = im.getexif()
exif[0x9286] = meta_prompt
exif[0x9c9c] = meta_prompt.encode('utf16')
exif[0x9c9d] = ('AI art' + upscale_text).encode('utf16')
exif[0x0131] = "https://github.com/rbbrdckybk/dream-factory"
newfilename = dt.now().strftime('%Y%m-%d%H-%M%S-') + str(nf_count)
nf_count += 1
im.save(output_dir + "/" + newfilename + ".jpg", exif=exif, quality=88)
if exists(samples_dir + "/" + f):
os.remove(samples_dir + "/" + f)
More testing. I increased the swap size to 64GB, so you were correct about the model loading. I no longer am getting the error:
FileNotFoundError: [Errno 2] No such file or directory: 'output/2022-12-05-example-standard/gpu_1'
However now there is a different problem, the program seems to run fine, however only one of the GPUs is completing jobs. Again I set the system to only two GPUs export CUDA_VISIBLE_DEVICES=0,1
You can see in the screenshot below, jobs completed is 7, they are all completed on CUDA:0. CUDA:1 is stuck, as you can see by the running time.
I revereted the code to the original state from the repo. It looks like the folders are now created, but the images are not moved into the proper directory. I also see gpu_0
directory being deleted and recreated over and over.
Note I have in prompts !SAMPLES = 1 # number of images to generate per prompt
so that each GPU only generates one image per run.
Hopefully this info is helpful.
Thanks for the help - there is definitely a bug somewhere multi-gpu generation. I managed to borrow a 3-gpu machine for testing and I can replicate the issue you're seeing.
I have an early prototype for a new backend working (hooking into the popular automatic1111 repo). I implemented staggered GPU initialization to avoid potential memory issues and am currently generating images on all 3 GPUs simultaneously with no issues (on a server with 16GB of system RAM and a single-core AMD Sempron from 2013!).
I'll need a couple days to integrate everything back into the Dream Factory front-end but I should hopefully have a new version out by the middle of the week.
Awesome! Please let me know if there is anything else I can do to help. Looking forward to the release.
On Mon, Dec 5, 2022, 10:32 PM rbbrdckybk @.***> wrote:
Thanks for the help - there is definitely a bug somewhere multi-gpu generation. I managed to borrow a 3-gpu machine for testing and I can replicate the issue you're seeing.
I have an early prototype for a new backend working (hooking into the popular automatic1111 repo http://url). I implemented staggered GPU initialization to avoid potential memory issues and am currently generating images on all 3 GPUs simultaneously with no issues (on a server with 16GB of system RAM and a single-core AMD Sempron from 2013!).
I'll need a couple days to integrate everything back into the Dream Factory front-end but I should hopefully have a new version out by the middle of the week.
— Reply to this email directly, view it on GitHub https://github.com/rbbrdckybk/dream-factory/issues/18#issuecomment-1338789919, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANOZBVO5HAKJ57ZZRR4JGDWL3FYDANCNFSM6AAAAAASN2UWKQ . You are receiving this because you authored the thread.Message ID: @.***>
Just pushed the latest code up - I'm pretty confident that everything will work for you, though I do recommend a fresh install. Full changelog is here. You'll also need to set up Auto1111's repo as part of the setup process now; although that should take more than a few minutes.
Let me know how it goes for you!
@rbbrdckybk works perfectly now, spins up pretty quick ~450 images in 10 minutes (6 GPUs). Amazing work, thank you. If you could elaborate on how to use stable diffusion 2, I'm assuming just put the SD2 model in Auto1111 model folder?
Also if you could guide me on how I might be able to modify the config file to start with images instead of prompts that would be awesome too. Either way I'm going to start working with this, if I have any significant updates I will create a PR.
Thanks again!
@mkabatek Glad to hear it's working!
Yup, just put whatever model(s) you want to use in the Auto1111 model folder. With SD 2.0 models, you'll also need to place the .yaml file alongside it (named the same as the .ckpt). Lots of reports of SD 2.1 not working properly with Auto1111 yet, so I haven't tried it, but SD 2.0 works fine for me. Restart Dream Factory after the model file(s) are in the proper place, create a new prompt file via the web editor, and you should see your model(s) listed in the commented area near the top.
Not sure what you mean by "start with images instead of prompts"?
Awesome going to download and try SD2 now.
Not sure what you mean by "start with images instead of prompts"?
Well there is a function/script img2img
- which takes input of an image (and optionally a prompt), and generates a similar image, or uses the image to start with. I'm ultimately going to attempt to get this function working with dream-factory so I can feed dream factory an image and have the GPUs generate similar images in parallel based on the initial image (instead of the permutations of the prompt).
Either way thanks for your help this is awesome!
@mkabatek oh my bad, img2img is already implemented! Just create a new prompt file and take a look at these 2 settings:
!INPUT_IMAGE = # can specify an input image here (output image will be same resolution)
!STRENGTH = 0.75 # strength of input image influence (0-1, with 1 corresponding to least influence)
Just set !INPUT_IMAGE= to whatever image you want to use at the starting image (relative to the Dream Factory folder, or absolute paths are fine too). !STRENGTH is the denoising strength. In standard prompt files, you can put both of these in the [prompts] sections too, in case you want to set up switches between multiple input images. Setting !INPUT_IMAGE= by itself will clear out any set input image.
I really need to get some documentation on the github page. :rofl:
@rbbrdckybk sorry new problem now. When I set an image file with an absolute path the terminal outputs the following:
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
[controller] >>> queued 0 work items.
I'm not sure what's going on here - now I do have one prompt with this, but I'm not sure how to get this to work properly.
Sorry i think I figured it out, it doesn't play nice with an empty prompt section.
Ah, if an empty [prompts] section causes issues that's a bug. If you find any more definitely feel free to open issues so I can address them!
@mkabatek Fixed the issue with empty [prompts] sections in the latest version. Going to go ahead and close this, but feel free to open up any new issues as you find them!
Hello,
I'm attempting to run dream-factory on a ubuntu machine, it appears to be running - the web interface comes up and all GPUs show status
dreaming
, however after some time the following errors start to get thrown. It appears to be something wrong with the folder creation.After a bit more time only two of the devices (which seem to be random), continue running. The other devices get stuck in the following
+exif data
state. While two of the devices keepdreaming
and generating files. The files however are not stored/copied into the proper directories e.g.gpu_0
, they remain in the root of theoutput/<date-config_name>
directory.I will attempt to debug this and make changes, however if you have any idea or guidance what could cause this that would be helpful. I really like this project and would like to contribute.
Please see the error below, following screenshot and system info.
Below are the system specs
CUDA devices