[bug]: Much Slower than DiffusionBee on M1 Mac

invoke-ai / InvokeAI

Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.

https://invoke-ai.github.io/InvokeAI/

Apache License 2.0

23.34k stars 2.4k forks source link

[bug]: Much Slower than DiffusionBee on M1 Mac #2606

Closed stevemac00 closed 1 year ago

stevemac00 commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

OS

macOS

GPU

mps

VRAM

No response

What happened?

I tried DiffusionBee as a "benchmark". Prompt: waterfall in mountainous forest, rainbow, sun, clouds, landscape, zoom out, photorealistic, f/16 Both using --steps=25 --width=512 --height=512 --cfg_scale=7.5 Note this is low-end M1 iMac with 8GB RAM and I recall invokeai recommends 16MB. DiffusionBee render time: 1+s/it, 46s InvokeAI render time: 10-13s/it, 295.04s using -sampler=k_lms

I have patchmatch working but I still get this error - embedding_manager.py:146: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU.

I've seen a few references to this but no solution for me. Could this be the slowness difference?

Screenshots

Note the radical difference in GPU History.

Additional context

I realize these apps are different and shouldn't be compared but I feel if I could max out the GPU I could reduce my s/it time.

Contact Details

No response

Adreitz commented 1 year ago

Are you running the fp16 or fp32 version of DiffusionBee? fp16 will run much faster, but is not supported yet in Invoke on Mac. What model are you using for both DB and Invoke? Make sure you're testing the same model, as different models will have different inference speeds.

Also, please use Activity Monitor to check the memory use during execution. If either program fills the RAM and starts using virtual memory, execution speed will greatly reduce. In my experience, the Invoke 2.3.0 release candidate has a severe memory issue, using much more memory than 2.2.5.

stevemac00 commented 1 year ago

I am using 1.5.1 (0016) which I interpret as fp16. I understand you point but don't you think I should see better usage on GPU history?

Adreitz commented 1 year ago

If Invoke is swapping in/out of virtual memory, it will not use the GPU to full effect. That's why you need to check the memory use.

Adreitz commented 1 year ago

Note that fp16 also uses less VRAM than fp32, so that by itself could be contributing to the difference you see.

stevemac00 commented 1 year ago

Yes, both using SD 1.5 model. Memory pressure is almost always yellow with an occasional thin red. May be slightly higher pressure with invokeai but close to the same as when running DiffusionBee.

Adreitz commented 1 year ago

The memory pressure graph is not a precise measurement, but I think if you're seeing any red at all, you're probably swapping. Look at the actual memory use by Python. You can also look at the drive reads by Python on the Disk tab, which will show the page-ins (note that it will always show at least 4-5GB read, as it needs to load the model, so look at the change in this value after you start running a generation). I think the page-outs are written by kernel_task, but that handles a lot of other things in addition, so is not as reliable.

stevemac00 commented 1 year ago

Python had over 8GB so there was swapping

BigZampano commented 1 year ago

DiffusionBee is also swapping like crazy, showing even more RAM usage than actually installed (like "15 GB of 8 GB used") and after switching models 2-3 times the swap file is over 20 GB... but the only thing that stops DB from nicely chugging along at 2.2s/it is when my disk space for swap is full... it doesn't even slow down when using the Mac for work at the same time... InvokeAI should be able to get up to speed on M1, DiffusionBee shows it's possible...

mallocator commented 1 year ago

The weird thing is a few versions ago I got it working just as fast on an m1 out of the box. Now it's taking about an hour to generate an image with default settings. Guess I'm going back to DiffusionBee until this is fixed :(

elsung commented 1 year ago

im experiencing the same thing. would love to use invoke again but right now its basically unusable. I had an old version running from several months ago and it was decent. The older version was with the update.sh, when that didn't work I realized there was an installer now and went with that. My renderings went from 2 minutes to 10 min for 1 image.

Are there plans to fix this for users who are also on the M1 / m2??? or at least clean up the documentation for installation? when I view the installation instructions under the Mac tab, it displays instructions and screens for windows....

lastly, there seems to be even more confusion for installation since you can pull the git repo, or install via pip... which is the proper / most updated method?

Adreitz commented 1 year ago

@elsung As I suggested above, check the memory use of Python when you run a generation. If you run out of memory and start using swap, the diffusion calculations will be rate limited by your SSD’s i/o.

Invoke has never been particularly memory efficient on Mac, but I noticed it get significantly worse after I updated to 13.3. 13.3 fixed a bunch of bad crashing bugs with SD as well as a corrupted output bug, but it seemed to inflate the memory use.

I am trying to take comparative data using invoke 2.2.5 and 2.3.4 vs data I had taken on Ventura 13.2.1, but it’s going slowly due to my available time.

rovo79 commented 1 year ago

Very slow generation on Mac mini M1 16GB InvokeAI 2.3.5

I was able to follow command-line installation, but ran into a few errors with 'greenlet' package. I was able to get it working by setting two packages manually: setuptools~=65.5 pip~=22.3

Images are taking 10-12 minutes to generate. Left it set to 50 steps.

Invoke startup:

InvokeAI, version 2.3.5 InvokeAI runtime directory is "/Users/rob/invokeai" GFPGAN Initialized CodeFormer Initialized ESRGAN Initialized Using device_type mps xformers not installed NSFW checker is disabled Current VRAM usage: 0.00G Loading diffusers model from runwayml/stable-diffusion-v1-5 | Using more accurate float32 precision | Loading diffusers VAE from stabilityai/sd-vae-ft-mse | Using more accurate float32 precision | Calculating sha256 hash of model files | sha256 = 9221ee79e19a30c7efd30401444a12593621e37c337f980c49078cff5f6d4ede (27 files hashed in 6.53s) | Default image dimensions = 512 x 512 Model loaded in 16.61s Textual inversion triggers: Setting Sampler to k_lms (LMSDiscreteScheduler) ESRGAN Parameters: False Facetool Parameters: False

Usage stats: 1 image(s) generated in 1331.75s

Tried other samplers

Setting Sampler to ddim (DDIMScheduler) 1 image(s) generated in 624.36s