Closed smblee closed 2 years ago
PYTORCH_ENABLE_MPS_FALLBACK is already set within the environment in Anaconda, and does not seem to cover the operator 'aten::nonzero'.
Followed the M1 instructions on Mac 12.5 version & python 3.10.4.
.../stable-diffusion/ldm/modules/embedding_manager.py:152: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/miniforge3/conda-bld/pytorch-recipe_1660136240338/work/aten/src/ATen/mps/MPSFallback.mm:11.)
and
.../stable-diffusion/ldm/modules/embedding_manager.py", line 155, in forward embedded_text[placeholder_idx] = placeholder_embedding NotImplementedError: The operator 'aten::_index_put_impl_' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
I could try
PYTORCH_ENABLE_MPS_FALLBACK
but is that how people are getting around this issue?
im having the same issue it falls back to using the cpu, please update if you find a fix
After bumping to the same issue (and finding this thread) I've updated pytorch to the nightly build and it worked
@lstein @magnusviri, would it make sense to list pytorch-nightly in the environment-mac.yml
?
Same issue with torch 1.13.0.dev20220901
Confirm that issue persists with latest pytorch nightly 1.13.0.dev20220901.
It also looks like the aten:non_zero
op hasn't been implemented for the MPS backend in pytorch yet.
@adelsz does this mean for SD on M1 MPS it's a matter of time or is there a work around?
SD on M1 works fine. Use the environment-mac.yaml
when creating your python environment with conda/mamba. I am running it right now on my M1 Macbook pro.
The warning containing aten::nonzero
is still present, but the image generation works fine.
It works, but the warning implies the inference is run on the CPU rather than the GPU.
Yes, at least whatever part of the code that uses nonzero. My Mac's GPU seems to be under 100% load during calls to SD, however. (See Activity Monitor -> Window -> GPU History)
SD on M1 works fine. Use the
environment-mac.yaml
when creating your python environment with conda/mamba. I am running it right now on my M1 Macbook pro.The warning containing
aten::nonzero
is still present, but the image generation works fine.
For me it isn't so, I get the warning and it does fall back to the CPU, therefore generation time becomes very long and I never seen it get past 20% on 1 iteration and 5 steps. How can I get it not to fall back to CPU?
M1 MBP 2020
Same here. Any tips on how to debug this?
$ git log
commit 751283a2de81bee4bb571fbabe4adb19f1d85b97 (HEAD -> main, origin/main, origin/HEAD)
Author: Kevin Gibbons <bakkot@gmail.com>
Date: Sat Sep 3 23:34:20 2022 -0700
$ conda info
active environment : ldm
active env location : /Users/u/miniconda3/envs/ldm
shell level : 1
user config file : /Users/u/.condarc
populated config files : /Users/u/.condarc
conda version : 4.12.0
conda-build version : not installed
python version : 3.9.12.final.0
virtual packages : __osx=12.5.1=0
__unix=0=0
__archspec=1=arm64
base environment : /Users/u/miniconda3 (writable)
conda av data dir : /Users/u/miniconda3/etc/conda
conda av metadata url : None
channel URLs : https://repo.anaconda.com/pkgs/main/osx-arm64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/osx-arm64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /Users/u/miniconda3/pkgs
/Users/u/.conda/pkgs
envs directories : /Users/u/miniconda3/envs
/Users/u/.conda/envs
platform : osx-arm64
user-agent : conda/4.12.0 requests/2.27.1 CPython/3.9.12 Darwin/21.6.0 OSX/12.5.1
UID:GID : 501:20
netrc file : None
offline mode : False
$ conda list | grep torch
pytorch 1.13.0.dev20220903 py3.9_0 pytorch-nightly
pytorch-lightning 1.6.5 pyhd8ed1ab_0 conda-forge
torch-fidelity 0.3.0 pypi_0 pypi
torchdiffeq 0.2.3 pypi_0 pypi
torchmetrics 0.9.3 pyhd8ed1ab_0 conda-forge
torchvision 0.14.0.dev20220903 py39_cpu pytorch-nightly
``` system_profiler SPSoftwareDataType SPHardwareDataType Software: System Software Overview: System Version: macOS 12.5.1 (21G83) Kernel Version: Darwin 21.6.0 Boot Volume: Macintosh HD Boot Mode: Normal Secure Virtual Memory: Enabled System Integrity Protection: Enabled Hardware: Hardware Overview: Model Name: MacBook Pro Model Identifier: MacBookPro18,3 Chip: Apple M1 Pro Total Number of Cores: 10 (8 performance and 2 efficiency) Memory: 16 GB Activation Lock Status: Enabled ```
Any tips on how to debug this?
I can't see anything obviously wrong with your log.
I installed using mamba (only because it's faster, but I guess theoretically this could impact it).
I've just run git pull
and tried the whole installation process again, starting with conda env create -f environment-mac.yaml
.
If you want to try an alternative, I've exported my environment file here. Copy to a file called thomasaarholt_env.yml
.
Create a new environment with:
conda env create -f thomasaarholt_env.yml
(or mamba env ...
)
Then I linked (or copied) the model downloaded from huggingface, and ran:
python scripts/preload_models.py
and
❯ python scripts/dream.py --full_precision # I just tested, and the --full_precision argument doesn't appear necessary
* Initializing, be patient...
>> cuda not available, using device mps
>> Loading model from models/ldm/stable-diffusion-v1/model.ckpt
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Using slower but more accurate full-precision math (--full_precision)
>> Setting Sampler to k_lms
>> model loaded in 9.38s
* Initialization done! Awaiting your command (-h for help, 'q' to quit)
dream> A monkey hacking into the NSA
@thomasaarholt I have same problem as OP. Just tried your thomasaarholt_env.yml
and got:
The following specifications were found to be incompatible with your system:
- feature:/osx-arm64::__osx==12.4=0
- feature:/osx-arm64::__unix==0=0
- feature:|@/osx-arm64::__osx==12.4=0
- feature:|@/osx-arm64::__unix==0=0
- ipykernel==6.15.2=pyh736e0ef_0 -> __osx
- ipykernel==6.15.2=pyh736e0ef_0 -> ipython[version='>=7.23.1'] -> __linux
- ipykernel==6.15.2=pyh736e0ef_0 -> ipython[version='>=7.23.1'] -> __win
- ipython==8.4.0=pyhd1c38e8_1 -> __osx
- ipywidgets==8.0.2=pyhd8ed1ab_0 -> ipykernel[version='>=4.5.1'] -> __linux
- ipywidgets==8.0.2=pyhd8ed1ab_0 -> ipykernel[version='>=4.5.1'] -> __osx
- ipywidgets==8.0.2=pyhd8ed1ab_0 -> ipykernel[version='>=4.5.1'] -> __win
- kornia==0.6.7=pyhd8ed1ab_0 -> pytorch[version='>=1.10'] -> __osx[version='>=11.0']
- pydeck==0.7.1=pyh6c4a22f_0 -> ipykernel -> __linux
- pydeck==0.7.1=pyh6c4a22f_0 -> ipykernel -> __osx
- pydeck==0.7.1=pyh6c4a22f_0 -> ipykernel -> __win
- pysocks==1.7.1=pyha2e5f31_6 -> __unix
- pytorch-lightning==1.6.5=pyhd8ed1ab_0 -> pytorch[version='>=1.8'] -> __osx[version='>=11.0']
- torchmetrics==0.9.3=pyhd8ed1ab_0 -> pytorch[version='>=1.3.1'] -> __osx[version='>=11.0']
- urllib3==1.26.11=pyhd8ed1ab_0 -> pysocks[version='>=1.5.6,<2.0,!=1.5.7'] -> __unix
- urllib3==1.26.11=pyhd8ed1ab_0 -> pysocks[version='>=1.5.6,<2.0,!=1.5.7'] -> __win
any idea what should I do with it?
Edit: maybe it's somehow related to python version. I have 3.9 and it's 3.10 in req file.
Still I do have same issue as OP and have no idea what to do with it
The env file should create a python environment with python 3.10. Whatever version you are using before creating the environment shouldn’t matter.
I can recommend trying to use mamba instead of conda. I have experienced different dependency resolution with it before. Try installing mamba in your conda environment according to the instructions, and then try creating the environment using mamba instead of conda.
No luck with mamba
Pip subprocess error:
ERROR: Could not find a version that satisfies the requirement k-diffusion==0.0.1 (from versions: none)
ERROR: No matching distribution found for k-diffusion==0.0.1
Edit: I've managed to install all dependencies with mamba. Now it fails with
ModuleNotFoundError: No module named 'ldm'
Ok, I've won this fight with modules. And still getting that error and generation is extremely slow
Could this performance issue be because I have only 8Gb of RAM? I've seen in multiple discussions that problems occur mostly within less-ram-equipped M1 machines. And if so, what can I tweak to improve the performance?
I've got 16Gb and still generation takes 5-10 minutes. There are a lot of users with 15-30 seconds (at least they told so) I was thinking slowness somehow related to fallback to cpu
@underlow What version of MacOS are you on?
I am on the Ventura Beta, and when I updated to the latest build (22A5331f), I saw an immediate 5x performance boost. I am at ~1s/it on an M1 Pro with 16GB RAM for standard 512x512 photos (so 30 seconds for 30 steps, etc). I haven't been able to find any official documentation for why this speed boost would happen, but it's worth exploration.
For comparison, my M1 with 8GB on Monterey is in the 20s/it range.
Latest but not beta. I've retried clean-set up again several times and it works better, 3 min instead on 15.
I've got 16Gb and still generation takes 5-10 minutes. There are a lot of users with 15-30 seconds (at least they told so) I was thinking slowness somehow related to fallback to cpu
On an M1? My generations take 3mins aswell (M1 Pro). With one iteration I get 15secs but the result is nothing useful at all
On an M1? My generations take 3mins aswell (M1 Pro). With one iteration I get 15secs but the result is nothing useful at all
I've got 3-5 mins now. But it was average 15 and up to 30 sometimes. I thought it was somehow related to this error. Looks like not.
After following the installation instructions, I have the same warning:
The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1659484612588/work/aten/src/ATen/mps/MPSFallback.mm:11.)
Despite that, the generation of a 512x512 image with 50 steps and a CFG scale of 7.5 (so all defaults) takes approx 60s and the GPU load approaches 85% for the python3.10 process in Activity Monitor during the whole process.
The system is a MBP 16" 2021 with M1 Pro and 32GB RAM hosting a macOS Monterey 12.5.1
I'm not sure how to make the error go away, but at least it works.
The source of the warning is ldm/modules/embedding_manager.py
, I think this means that turning the prompt into embeddings is falling back to CPU. Generating the image could still be happening on the GPU. We can see whether there is a difference once aten::nonzero
is checked in https://github.com/pytorch/pytorch/issues/77764 as mentioned above.
On my macbook air m1 16GB (2020) it takes 3 minutes to generate an image with the default settings of dream.py
. That feels too fast to be generating the image on the CPU.
It might be possible to change embedding_manager to use something that has already been implemented in MPS instead of aten::nonzero
but I have no idea how.
I am getting that same warning but images are generating in well under a minute so I think it is using the GPU. Just one more data point...
Same warning for M3
Followed the M1 instructions on Mac 12.5 version & python 3.10.4.
and
I could try
PYTORCH_ENABLE_MPS_FALLBACK
but is that how people are getting around this issue?