microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.09k stars 4.06k forks source link

[BUG] 'StableDiffusionPipeline' object has no attribute 'children' #2968

Closed stevensu1977 closed 1 year ago

stevensu1977 commented 1 year ago

I not luck ,get 'StableDiffusionPipeline' object has no attribute 'children' , may be use wrong diffuers version ? diffusers 0.13.1 deepspeed 0.8.2

File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/auto_tp.py", line 19, in get_module_list AttributeError: 'StableDiffusionPipeline' object has no attribute 'children' 'StableDiffusionPipeline' object has no attribute 'children'

molly-smith commented 1 year ago

Hi @stevensu1977 , can you provide the script you used?

sjkoo1989 commented 1 year ago

same here. My scripts looks like below

from diffusers import StableDiffusionPipeline
import deepspeed

....

async def startup_event():
    app.state.pipe = StableDiffusionPipeline.from_pretrained(
        settings.model_name_or_path,
        revision="fp16",
        torch_dtype=torch.float16,
    ).to(settings.device)

    deepspeed.init_inference(
        model=getattr(app.state.pipe, "model", app.state.pipe),  # Transformers models
        mp_size=1,  # Number of GPU
        dtype=torch.float16,  # dtype of the weights (fp16)
        replace_method="auto",  # Lets DS autmatically identify the layer to replace
        replace_with_kernel_inject=False,  # replace the model with the kernel injector
    )

async def generate(request: GenerationRequest):
    with torch.inference_mode():
        generated_images = app.state.pipe(
            prompt=request.prompts,
            num_inference_steps=request.num_inference_steps,
            guidance_scale=request.guidance_scale,
            negative_prompt=request.negative_prompts,
            num_images_per_prompt=request.num_images_per_prompt,
        )

        img_list = [from_image_to_bytes(generated_image) for generated_image in generated_images.images]
        return JSONResponse(img_list)
sjkoo1989 commented 1 year ago

I used Python 3.10 with requirements settings below

fastapi==0.86.0
pydantic
uvicorn==0.19.0

accelerate
diffusers
torch
transformers

deepspeed
triton==2.0.0
sjkoo1989 commented 1 year ago
  File "/./app.py", line 51, in startup_event
    deepspeed.init_inference(
  File "/usr/local/lib/python3.10/site-packages/deepspeed/__init__.py", line 311, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/usr/local/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 139, in __init__
    parser_dict = AutoTP.tp_parser(model)
  File "/usr/local/lib/python3.10/site-packages/deepspeed/module_inject/auto_tp.py", line 98, in tp_parser
    module_list = AutoTP.get_module_list(model)
  File "/usr/local/lib/python3.10/site-packages/deepspeed/module_inject/auto_tp.py", line 19, in get_module_list
    for child in model.children():
AttributeError: 'StableDiffusionPipeline' object has no attribute 'children'
sjkoo1989 commented 1 year ago

It looks like the bug could only be reproduced in version 0.8.2 or later. (the bug does not rely on the version of diffusers)

gaziqbal commented 1 year ago

I am running into the same issue (using diffusers==0.14.0) @sjkoo1989 which version of DeepSpeed were you able to run?

On my end 0.8.1 fails with

  File "/home/ubuntu/DeepSpeed-MII/.venv/lib/python3.8/site-packages/deepspeed/module_inject/auto_tp.py", line 35, in supported
    if key.group(1).lower() in unsupported:
AttributeError: 'NoneType' object has no attribute 'group'

And 0.8.0 and 0.7.7 fail with

AttributeError: module 'diffusers.models.vae' has no attribute 'AutoencoderKL'
gaziqbal commented 1 year ago

There is a bit more progress after reverting to diffusers 0.11.1 and deepspeed 0.8.0 The server loads now but crashes at inference in _fwd_kernel

        qk += tl.dot(q, k, trans_b=True)
                        ^
molly-smith commented 1 year ago

Hi @stevensu1977 and @gaziqbal, can you try setting kernel injection to True?

BogdanDarius commented 1 year ago

@molly-smith Loading the model works for me with the following packages:

accelerate==0.17.0
deepspeed==0.8.2
diffusers==0.14.0
transformers==4.26.1
triton==2.0.0
torch==1.13.1

I run the pipeline like this:

deepspeed.init_inference(pipe.to("cuda"), dtype=torch.float16, replace_with_kernel_inject=True, enable_cuda_graph=True)

But during inference it fails with the following error:

TypeError: DSUNet._forward() got an unexpected keyword argument 'cross_attention_kwargs'
gaziqbal commented 1 year ago

Likewise as @BogdanDarius - if I explicitly set config.replace_with_kernel_inject = True in InferenceEngine.init then the model (CompVis/stable-diffusion-v1-4) loads but still crashes on inference.

diffusers 0.14.0 and 0.13.0 crash with the following error

grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "Exception calling application: _forward() got an unexpected keyword argument 'cross_attention_kwargs'"
        debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:50050 {created_time:"2023-03-14T00:04:46.791932314+00:00", grpc_status:2, grpc_message:"Exception calling application: _forward() got an unexpected keyword argument \'cross_attention_kwargs\'"}"

diffuser 0.11.1 crash with the same error as above https://github.com/microsoft/DeepSpeed/issues/2968#issuecomment-1466914276

sjkoo1989 commented 1 year ago

Follow version settings work

accelerate
diffusers==0.6.0
torch
transformers[sentencepiece]==4.24.0

deepspeed==0.7.4
triton==2.0.0.dev20221030
molly-smith commented 1 year ago

disregard PR https://github.com/microsoft/DeepSpeed/pull/3083

JacquiML commented 1 year ago

Hey @molly-smith , I met exactly the same error with @BogdanDarius . and updated deepspeed/inference/engine.py based on PR https://github.com/microsoft/DeepSpeed/pull/3083, but still has this error:

File "/opt/conda/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 667, in __call__
    noise_pred = self.unet(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/model_implementations/diffusers/unet.py", line 41, in forward
    return self._forward(*inputs, **kwargs)
TypeError: DSUNet._forward() got an unexpected keyword argument 'cross_attention_kwargs'

Thank you!

JacquiML commented 1 year ago

Hey @molly-smith , if I use model = deepspeed.init_inference(pipe.to("cuda"), dtype=torch.float16) to disable kernel injection, I met the previous error 'StableDiffusionPipeline' object has no attribute 'children'

[2023-03-27 02:21:13,610] [INFO] [logging.py:93:log_dist] [Rank -1] DeepSpeed info: version=0.8.3, git-hash=unknown, git-branch=unknown
[2023-03-27 02:21:13,611] [INFO] [logging.py:93:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
Traceback (most recent call last):
  File "/test.py", line 17, in <module>
    model = deepspeed.init_inference(pipe.to("cuda"), dtype=torch.float16)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/__init__.py", line 311, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 139, in __init__
    parser_dict = AutoTP.tp_parser(model)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/module_inject/auto_tp.py", line 98, in tp_parser
    module_list = AutoTP.get_module_list(model)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/module_inject/auto_tp.py", line 19, in get_module_list
    for child in model.children():
AttributeError: 'StableDiffusionPipeline' object has no attribute 'children'
oliao28 commented 1 year ago

followed every step of this thread, and landed in the same error as @JacquiML

molly-smith commented 1 year ago

Hi all, sorry for the delay. The error 'StableDiffusionPipeline' object has no attribute 'children' is because you need to enable kernel injection.

The error TypeError: DSUNet._forward() got an unexpected keyword argument 'cross_attention_kwargs' is being caused by changes in the latest releases of diffusers (versions 0.13.0 and above). Working on a fix. In the meantine, let me know if enabling kernel injection and using diffusers 0.12.0 or below works for you. Thanks.

JacquiML commented 1 year ago

Hey @molly-smith , I have downgraded diffusers version to 0.11.1, and then met the error below. Thanks for your support ahead!

Time to load spatial_inference op: 21.2737238407135 seconds
**** found and replaced unet w. <class 'deepspeed.model_implementations.diffusers.unet.DSUNet'>
  0%|                                                                                                                                                         | 0/50 [00:00<?, ?it/s]------------------------------------------------------
Free memory : 18.335083 (GigaBytes)  
Total memory: 22.199097 (GigaBytes)  
Requested memory: 1.015625 (GigaBytes) 
Setting maximum total tokens (input + output) to 4096 
------------------------------------------------------
  0%|                                                                                                                                                         | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "<string>", line 21, in _fwd_kernel
KeyError: ('2-.-0-.-0-83ca8b715a9dc5f32dc1110973485f64-d6252949da17ceb5f3a278a70250af13-3b85c7bef5f0a641282f3b73af50f599-3d2aedeb40d6d81c66a42791e268f98b-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float16, torch.float16, torch.float16, 'fp32', torch.float32, torch.float16, 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), (128, 64, 128), (True, True, True, (False,), True, True, (True, False), (True, False), (True, False), (False, True), (True, False), (True, False), (True, False), (False, True), (True, False), (True, False), (True, False), (False, True), (True, False), (True, False), (True, False), (False, True), (False, False), (False, False), (True, False)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 937, in build_triton_ir
    generator.visit(fn.parse())
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/opt/conda/lib/python3.10/ast.py", line 418, in visit
    return visitor(node)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 183, in visit_Module
    ast.NodeVisitor.generic_visit(self, node)
  File "/opt/conda/lib/python3.10/ast.py", line 426, in generic_visit
    self.visit(item)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/opt/conda/lib/python3.10/ast.py", line 418, in visit
    return visitor(node)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 252, in visit_FunctionDef
    has_ret = self.visit_compound_statement(node.body)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 177, in visit_compound_statement
    self.last_ret_type = self.visit(stmt)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/opt/conda/lib/python3.10/ast.py", line 418, in visit
    return visitor(node)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 678, in visit_For
    self.visit_compound_statement(node.body)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 177, in visit_compound_statement
    self.last_ret_type = self.visit(stmt)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/opt/conda/lib/python3.10/ast.py", line 418, in visit
    return visitor(node)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 319, in visit_AugAssign
    self.visit(assign)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/opt/conda/lib/python3.10/ast.py", line 418, in visit
    return visitor(node)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 301, in visit_Assign
    values = self.visit(node.value)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/opt/conda/lib/python3.10/ast.py", line 418, in visit
    return visitor(node)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 339, in visit_BinOp
    rhs = self.visit(node.right)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/opt/conda/lib/python3.10/ast.py", line 418, in visit
    return visitor(node)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 797, in visit_Call
    return fn(*args, _builder=self.builder, **kws)
  File "/opt/conda/lib/python3.10/site-packages/triton/impl/base.py", line 22, in wrapper
    return fn(*args, **kwargs)
TypeError: dot() got an unexpected keyword argument 'trans_b'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/performance_optimisation/deepspeed/generate_image_benchmark.py", line 20, in <module>
    print(model(prompt))
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 562, in forward
    outputs = self.module(*inputs, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 529, in __call__
    noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/model_implementations/diffusers/unet.py", line 41, in forward
    return self._forward(*inputs, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/model_implementations/diffusers/unet.py", line 63, in _forward
    return self.unet(sample, timestamp, encoder_hidden_states, return_dict)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 424, in forward
    sample, res_samples = downsample_block(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 777, in forward
    hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/diffusers/models/attention.py", line 216, in forward
    hidden_states = block(hidden_states, encoder_hidden_states=encoder_hidden_states, timestep=timestep)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/diffusers_transformer_block.py", line 106, in forward
    out_attn_1 = self.attn_1(out_norm_1)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/diffusers_attention.py", line 228, in forward
    output = DeepSpeedDiffusersAttentionFunction.apply(
  File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/diffusers_attention.py", line 117, in forward
    output = selfAttention_fp(input, context, input_mask)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/diffusers_attention.py", line 81, in selfAttention_fp
    context_layer = triton_flash_attn_kernel(qkv_out[0],
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/triton_ops.py", line 120, in forward
    _fwd_kernel[grid](
  File "<string>", line 41, in _fwd_kernel
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 1621, in compile
    next_module = compile(module)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 1550, in <lambda>
    lambda src: ast_to_ttir(src, signature, configs[0], constants)),
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 962, in ast_to_ttir
    mod, _ = build_triton_ir(fn, signature, specialization, constants)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 942, in build_triton_ir
    raise CompilationError(fn.src, node) from e
triton.compiler.CompilationError: at 58:24:
def _fwd_kernel(
    Q,
    K,
    V,
    sm_scale,
    TMP,
    Out,
    stride_qz,
    stride_qh,
    stride_qm,
    stride_qk,
    stride_kz,
    stride_kh,
    stride_kn,
    stride_kk,
    stride_vz,
    stride_vh,
    stride_vk,
    stride_vn,
    stride_oz,
    stride_oh,
    stride_om,
    stride_on,
    Z,
    H,
    N_CTX,
    BLOCK_M: tl.constexpr,
    BLOCK_DMODEL: tl.constexpr,
    BLOCK_N: tl.constexpr,
):
    start_m = tl.program_id(0)
    off_hz = tl.program_id(1)
    # initialize offsets
    offs_m = start_m * BLOCK_M + tl.arange(0, BLOCK_M)
    offs_n = tl.arange(0, BLOCK_N)
    offs_d = tl.arange(0, BLOCK_DMODEL)
    off_q = off_hz * stride_qh + offs_m[:, None] * stride_qm + offs_d[None, :] * stride_qk
    off_k = off_hz * stride_kh + offs_n[:, None] * stride_kn + offs_d[None, :] * stride_kk
    off_v = off_hz * stride_vh + offs_n[:, None] * stride_qm + offs_d[None, :] * stride_qk
    # Initialize pointers to Q, K, V
    q_ptrs = Q + off_q
    k_ptrs = K + off_k
    v_ptrs = V + off_v
    # initialize pointer to m and l
    t_ptrs = TMP + off_hz * N_CTX + offs_m
    m_i = tl.zeros([BLOCK_M], dtype=tl.float32) - float("inf")
    l_i = tl.zeros([BLOCK_M], dtype=tl.float32)
    acc = tl.zeros([BLOCK_M, BLOCK_DMODEL], dtype=tl.float32)
    # load q: it will stay in SRAM throughout
    q = tl.load(q_ptrs)
    # loop over k, v and update accumulator
    for start_n in range(0, N_CTX, BLOCK_N):
        start_n = tl.multiple_of(start_n, BLOCK_N)
        # -- compute qk ----
        k = tl.load(k_ptrs + start_n * stride_kn)

        qk = tl.zeros([BLOCK_M, BLOCK_N], dtype=tl.float32)
        qk += tl.dot(q, k, trans_b=True)
molly-smith commented 1 year ago

For diffusers 0.13.0 or above. please try https://github.com/microsoft/DeepSpeed/pull/3142

molly-smith commented 1 year ago

@JacquiML , I think you may need a different triton version. It should be triton 2.0.0.dev20221202

JacquiML commented 1 year ago

Hey @molly-smith , thanks for the replies above!

For diffusers 0.13.0 or above. please try https://github.com/microsoft/DeepSpeed/pull/3142

Great! It works on diffusers 0.14.0

@JacquiML , I think you may need a different triton version. It should be triton 2.0.0.dev20221202

Yep, It works on diffusers 0.14.0 with triton 2.0.0.dev20221202. But triton 2.0.0.dev20221202 needs PyTorch 1.13.1, it will downgrade PyTorch 2.0 to PyTorch 1.13.1 when installing triton 2.0.0.dev20221202. Could DeepSpeed support PyTorch 2.0 too? Thanks!

JacquiML commented 1 year ago

Hey @molly-smith , a follow-up question: do you know an ETA when this merged PR: https://github.com/microsoft/DeepSpeed/pull/3142 will be in included in a new DeepSpeed release version in PyPi? Many thanks!

ttio2tech commented 1 year ago

Hi @molly-smith , I was using diffusers 0.14, deepspeed 0.9.0, pytorch 1.13 but still got this error.

jy00161yang commented 1 year ago

Hi all, sorry for the delay. The error 'StableDiffusionPipeline' object has no attribute 'children' is because you need to enable kernel injection.

The error TypeError: DSUNet._forward() got an unexpected keyword argument 'cross_attention_kwargs' is being caused by changes in the latest releases of diffusers (versions 0.13.0 and above). Working on a fix. In the meantine, let me know if enabling kernel injection and using diffusers 0.12.0 or below works for you. Thanks.

@molly-smith Thanks for your support. However, after I enable kernel injection, the error became " module 'diffusers.models.attention' has no attribute 'CrossAttention' ”. diffusers version was 0.15.0 Is there any solution to this?

Besides, I downgrade diffusers to 0.11.1, the model was successfullly loaded, but during inference, it shows: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

molly-smith commented 1 year ago

@jy00161yang, this fix is for diffusers 0.13.0 and 0.14.0. Diffusers 0.15.0 was released after this fix. I will work on a fix for 0.15.0 soon.

molly-smith commented 1 year ago

Hey @molly-smith , a follow-up question: do you know an ETA when this merged PR: #3142 will be in included in a new DeepSpeed release version in PyPi? Many thanks!

@JacquiML Should be merged in DeepSpeed v0.9.0

DenisDiachkov commented 1 year ago

@jy00161yang

@molly-smith Thanks for your support. However, after I enable kernel injection, the error became " module 'diffusers.models.attention' has no attribute 'CrossAttention' ”. diffusers version was 0.15.0 Is there any solution to this?

I got the same problem. Did you manage to resolve it?

jackchen556 commented 3 months ago

you can add parameters in deepspeed.init_inference(), replace_with_kernel_inject=True, and i fix the bug in the same problem