Getting an error on the pipe.enable_xformers_memory_efficient_attention() step

ajaysolanky commented 1 year ago

Getting this error on the pipe.enable_xformers_memory_efficient_attention() step:

RuntimeError                              Traceback (most recent call last)
[<ipython-input-5-4f0f5a2b85ee>](https://localhost:8080/#) in <module>
     12     )
     13 pipe = pipe.to(device)
---> 14 pipe.enable_xformers_memory_efficient_attention()
     15 
     16 if model_id.endswith('-base'):

15 frames
[/usr/local/lib/python3.8/dist-packages/xformers/ops.py](https://localhost:8080/#) in no_such_operator(*args, **kwargs)
     44 def _get_xformers_operator(name: str):
     45     def no_such_operator(*args, **kwargs):
---> 46         raise RuntimeError(
     47             f"No such operator xformers::{name} - did you forget to build xformers with `python setup.py develop`?"
     48         )

RuntimeError: No such operator xformers::efficient_attention_forward_cutlass - did you forget to build xformers with `python setup.py develop`?

Any advice? Thanks

woctezuma commented 1 year ago

The error indicates that xformers has not been installed. The installation of xformers must have failed, because:

either there was no pre-compiled wheel for your GPU on Google Colab,
or the download of the pre-compiled wheel failed for some reason.

There are 5 possible GPU: A100, K80, P100, T4, V100. What is the output of the following command?

!nvidia-smi -q

Moreover, if you cannot solve the issue, then you could comment the line where the error happens anyway:

- pipe.enable_xformers_memory_efficient_attention()
+ # pipe.enable_xformers_memory_efficient_attention()

The only down-side is that you won't be able to generate several 768x768 images.

In this case, change these lines to generate 512x512 images instead of 768x768 images:

- model_id = "stabilityai/stable-diffusion-2"
+ model_id = "stabilityai/stable-diffusion-2-base"

If you really want to generate a 768x768 image instead of 512x512 images, then change the batch size instead of model_id:

- num_images = 4
+ num_images = 1

However, you risk to run out of memory on Colab if you run inference several times at the 768x768 resolution without xformers.

ajaysolanky commented 1 year ago

I see, thanks

Output of !nvidia-smi -q is:

==============NVSMI LOG==============

Timestamp                                 : Sun Dec 11 17:03:37 2022
Driver Version                            : 460.32.03
CUDA Version                              : 11.2

Attached GPUs                             : 1
GPU 00000000:00:04.0
    Product Name                          : Tesla T4
    Product Brand                         : Tesla
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Disabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1561120016390
    GPU UUID                              : GPU-1b965d81-1d0f-6501-027c-1ecc1437f5ea
    Minor Number                          : 0
    VBIOS Version                         : 90.04.A7.00.01
    MultiGPU Board                        : No
    Board ID                              : 0x4
    GPU Part Number                       : 900-2G183-6300-T00
    Inforom Version
        Image Version                     : G183.0200.00.02
        OEM Object                        : 1.1
        ECC Object                        : 5.0
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GPU Virtualization Mode
        Virtualization Mode               : Pass-Through
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x00
        Device                            : 0x04
        Domain                            : 0x0000
        Device Id                         : 0x1EB810DE
        Bus Id                            : 00000000:00:04.0
        Sub System Id                     : 0x12A210DE
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 3
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
    Fan Speed                             : N/A
    Performance State                     : P0
    Clocks Throttle Reasons
        Idle                              : Not Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 15109 MiB
        Used                              : 0 MiB
        Free                              : 15109 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 2 MiB
        Free                              : 254 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 4 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : Enabled
        Pending                           : Enabled
    ECC Errors
        Volatile
            SRAM Correctable              : 0
            SRAM Uncorrectable            : 0
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
        Aggregate
            SRAM Correctable              : 0
            SRAM Uncorrectable            : 0
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
    Retired Pages
        Single Bit ECC                    : 0
        Double Bit ECC                    : 0
        Pending Page Blacklist            : No
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 61 C
        GPU Shutdown Temp                 : 96 C
        GPU Slowdown Temp                 : 93 C
        GPU Max Operating Temp            : 85 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 28.67 W
        Power Limit                       : 70.00 W
        Default Power Limit               : 70.00 W
        Enforced Power Limit              : 70.00 W
        Min Power Limit                   : 60.00 W
        Max Power Limit                   : 70.00 W
    Clocks
        Graphics                          : 585 MHz
        SM                                : 585 MHz
        Memory                            : 5000 MHz
        Video                             : 750 MHz
    Applications Clocks
        Graphics                          : 585 MHz
        Memory                            : 5001 MHz
    Default Applications Clocks
        Graphics                          : 585 MHz
        Memory                            : 5001 MHz
    Max Clocks
        Graphics                          : 1590 MHz
        SM                                : 1590 MHz
        Memory                            : 5001 MHz
        Video                             : 1470 MHz
    Max Customer Boost Clocks
        Graphics                          : 1590 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Processes                             : None

woctezuma commented 1 year ago

Your GPU is supported.

Product Name                          : Tesla T4

I don't know why the installation of xformers would have failed. Colab has given me the same GPU, so I can test.

Tesla T4

First, I see a warning which was not there before, when running the cell about the scheduler.

from diffusers import EulerDiscreteScheduler

scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")

WARNING:xformers:WARNING: /usr/local/lib/python3.8/dist-packages/xformers/_C.so: undefined symbol: _ZNK3c104impl13OperatorEntry20reportSignatureErrorENS0_12CppSignatureE
Need to compile C++ extensions to get sparse attention support. Please run python setup.py build develop

/usr/local/lib/python3.8/dist-packages/xformers/_C.so: undefined symbol: _ZNK3c104impl13OperatorEntry20reportSignatureErrorENS0_12CppSignatureE

Then I get the same error which you reported. 🤔

woctezuma commented 1 year ago

The issue should be fixed with the new wheels for xformers. Thank you for reporting this issue!

For reference, the wheels correspond to this version of PyTorch:

import torch

print(torch.__version__)

1.13.0+cu116

and can be originally found at Facebook's official repository under the following name:

xformers-ubuntu-22.04-py3.8-torch1.13.0+cu116.whl (45.2 MB)

I have downloaded and unzipped the file from the official Github repository, and computed the hash on Windows:

CertUtil -hashfile "xformers-0.0.15.dev0+4c06c79.d20221205-cp38-cp38-linux_x86_64.whl" SHA256

966510b8cedf9291dc8b8da3314935d1913438531e39a3f4953ead831a743567

I have also downloaded the file from the unofficial Github repository, and the hash matches the official one:

github_url = "https://github.com/brian6091/xformers-wheels"
xformer_id = "0.0.15.dev0+4c06c79"
xformers_wheels = f"xformers-{xformer_id}.d20221205-cp38-cp38-linux_x86_64.whl"
!wget {github_url}/releases/download/{xformer_id}/{xformers_wheels}
!sha256sum {xformers_wheels}

966510b8cedf9291dc8b8da3314935d1913438531e39a3f4953ead831a743567

woctezuma / stable-diffusion-colab

Getting an error on the pipe.enable_xformers_memory_efficient_attention() step #6