siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.
https://github.com/siliconflow/onediff/wiki
Apache License 2.0
1.7k stars 103 forks source link

convert <class 'function'> failed: Transform failed of <class 'function'>: 'function' object has no attribute '<locals>' #797

Closed neiltian-tencent closed 4 months ago

neiltian-tencent commented 7 months ago

debug info:

  1. singledispatch_proxy wrapper first_param: <function ProxySubmodule.getattribute.\<locals>.\<lambda> at 0x7fedb8479ab0>
  2. MockEntityNameFormatter _format_full_class_name: onediff.infer_compiler.transform.builtin_transform.ProxySubmodule.getattribute.\<locals>.\<lambda>

error info: onediff/src/onediff/infer_compiler/transform/builtin_transform.py:233 - convert <class 'function'> failed: Transform failed of <class 'function'>: 'function' object has no attribute ''

funciton info:
<function ProxySubmodule.getattribute at 0x7fedb847fe20>

doombeaker commented 7 months ago

could you tell me, which model are you trying to convert by onediff?

it may be caused some closure feature of python

neiltian-tencent commented 7 months ago

@doombeaker I try to find an open source demo. Error related codes: def getattribute(self, attribute) (onediff/src/onediff/infer_compiler/transform/builtin_transform.py) elif attribute in ["forward", "_conv_forward"]: replacement = proxy_class(type(self._oflow_proxy_submod)) return lambda *args, *kwargs: getattr(replacement, attribute)( self, args, **kwargs )

neiltian-tencent commented 7 months ago

@doombeaker https://github.com/MooreThreads/Moore-AnimateAnyone demo has similar errors.

neiltian-tencent commented 7 months ago

@doombeaker could you tell me, how to fix this issue? first of all, do i need to redefine the corresponding oneflow class and register it based on the register interface? https://github.com/MooreThreads/Moore-AnimateAnyone/tree/master/src/models

neiltian-tencent commented 7 months ago

error info: TemporalBasicTransformerBlock module's forward interface is hijacked by hacked_basic_transformer_inner_forward.(https://github.com/MooreThreads/Moore-AnimateAnyone/blob/master/src/models/mutual_self_attention.py)
does oneflow support this situation? @strint @doombeaker

neiltian-tencent commented 7 months ago

@strint @doombeaker TemporalBasicTransformerBlock module's forward is not hijacked(reference_attn is False), oneflow can execute the compile process despite the following error. (https://github.com/MooreThreads/Moore-AnimateAnyone/blob/master/src/models/mutual_self_attention.py) ERROR run got error: <class 'oneflow._oneflow_internal.exception.Exception'> Cannot find the kernel matching Current OperatorConf. The Info of OperatorConf are op_name: model.up_blocks.0.upsamplers.0-upsample_nearest_3d-1965 op_type_name: upsample_nearest_3d DeviceType_Name: kCUDA DataType_Name of x_0: kFloat16 DataType_Name of y_0: kFloat16 File "oneflow/core/job/job_interpreter.cpp", line 326, in InterpretJob RunNormalOp(launch_context, launch_op, inputs) File "oneflow/core/job/job_interpreter.cpp", line 238, in RunNormalOp it.Apply(op, inputs, &outputs, OpExprInterpContext(empty_attr_map, JUST(launch_op.device))) File "oneflow/core/framework/op_interpreter/eager_local_op_interpreter.cpp", line 155, in NaiveInterpret PhysicalRun([&](InstructionsBuilder builder) -> Maybe ... output_eager_blob_objects), ctx, result->stream()); }) File "oneflow/core/framework/instructions_builder.h", line 168, in PhysicalRun Build(&instructions_builder) File "oneflow/core/framework/instructions_builder.cpp", line 400, in Call vm::OpCallInstructionPolicy::New( vm_stream, opkernel ... global_tensor_infer_result, ctx, *one::CurrentDevVmDepObjectConsumeMode()) File "oneflow/core/vm/op_call_instruction_policy.h", line 50, in New ptr->Init() File "oneflow/user/kernels/stateful_opkernel.cpp", line 920, in ChooseOpKernel user_op::UserOpRegistryMgr::Get().GetOpKernelRegistryResult(op_type_name, reg_ctx) Error Type: oneflow.ErrorProto.op_kernel_not_found_error

doombeaker commented 7 months ago

@wangerlie is working on it, and will let you know if there is any progress

zhangvia commented 6 months ago

hey,are you trying to use the onediff to accelerate the animate anyone? i'm also working on it. and i find that the torch2.2.0 + torch.compile + cudnn8.6 will speed up animate anyone about 30%. but it is really weird that if i run the pipeline consecutively twice,the second result is completely the same as the first result even if i change the reference image of second execution.

so i try to use onediff, and Transform failed of <class 'function'>: 'function' object has no attribute '<locals>' error happens

neiltian-tencent commented 6 months ago

hey,are you trying to use the onediff to accelerate the animate anyone? i'm also working on it. and i find that the torch2.2.0 + torch.compile + cudnn8.6 will speed up animate anyone about 30%. but it is really weird that if i run the pipeline consecutively twice,the second result is completely the same as the first result even if i change the reference image of second execution.

so i try to use onediff, and Transform failed of <class 'function'>: 'function' object has no attribute '<locals>' error happens

@zhangvia torch.compile dynamic=True?

zhangvia commented 6 months ago

my denoising unet takes fixed shape latent as input, and i don't use torch.compile to compile the referencenet,because the pipeline only run referencenet once when generate video. and i do try to set dynamic = True. the bug disappear,but the pipeline will cost more memory than dynamic=false

zhangvia commented 6 months ago

sorry the bug is still here when set dynamic=true. but if i set os.environ['TORCH_LOGS']="recompiles", the bug disappear

neiltian-tencent commented 6 months ago

@strint @doombeaker TemporalBasicTransformerBlock module's forward is not hijacked(reference_attn is False), oneflow can execute the compile process despite the following error. (https://github.com/MooreThreads/Moore-AnimateAnyone/blob/master/src/models/mutual_self_attention.py) ERROR run got error: <class 'oneflow._oneflow_internal.exception.Exception'> Cannot find the kernel matching Current OperatorConf. The Info of OperatorConf are op_name: model.up_blocks.0.upsamplers.0-upsample_nearest_3d-1965 op_type_name: upsample_nearest_3d DeviceType_Name: kCUDA DataType_Name of x_0: kFloat16 DataType_Name of y_0: kFloat16 File "oneflow/core/job/job_interpreter.cpp", line 326, in InterpretJob RunNormalOp(launch_context, launch_op, inputs) File "oneflow/core/job/job_interpreter.cpp", line 238, in RunNormalOp it.Apply(_op, inputs, &outputs, OpExprInterpContext(empty_attr_map, JUST(launch_op.device))) File "oneflow/core/framework/op_interpreter/eager_local_opinterpreter.cpp", line 155, in NaiveInterpret PhysicalRun([&](InstructionsBuilder builder) -> Maybe ... output_eager_blob_objects), ctx, result->stream()); }) File "oneflow/core/framework/instructions_builder.h", line 168, in PhysicalRun Build(&instructions_builder) File "oneflow/core/framework/instructions_builder.cpp", line 400, in Call vm::OpCallInstructionPolicy::New( vm_stream, opkernel ... global_tensor_infer_result, ctx, *one::CurrentDevVmDepObjectConsumeMode()) File "oneflow/core/vm/op_call_instruction_policy.h", line 50, in New ptr->Init() File "oneflow/user/kernels/stateful_opkernel.cpp", line 920, in ChooseOpKernel user_op::UserOpRegistryMgr::Get().GetOpKernelRegistryResult(op_type_name, reg_ctx) Error Type: oneflow.ErrorProto.op_kernel_not_found_error

@doombeaker @wangerlie The upsample_nearest_3d error is due to missing registration of the half implementation. After adding relevant implementations, Error: CUDA out of memory. File "/data2/workspace/neiltian/onediff/src/onediff/infer_compiler/oneflow/utils.py", line 21, in wrapper return func(self, *args, kwargs) File "/data2/workspace/neiltian/onediff/src/onediff/infer_compiler/utils/graph_management_utils.py", line 91, in wrapper ret = func(self, *args, *kwargs) File "/data2/workspace/neiltian/onediff/src/onediff/infer_compiler/oneflow/deployable_module.py", line 99, in forward output = dpl_graph(args, kwargs) File "/data2/workspace/neiltian/oneflow/python/oneflow/nn/graph/graph.py", line 281, in call return self._dynamic_input_graph_cache(*args, kwargs) File "/data2/workspace/neiltian/oneflow/python/oneflow/nn/graph/cache.py", line 115, in call return graph(*args, *kwargs) File "/data2/workspace/neiltian/oneflow/python/oneflow/nn/graph/graph.py", line 284, in call self._compile(args, kwargs) File "/data2/workspace/neiltian/oneflow/python/oneflow/nn/graph/graph.py", line 852, in _compile return self._compile_new(*args, **kwargs) File "/data2/workspace/neiltian/oneflow/python/oneflow/nn/graph/graph.py", line 876, in _compile_new self.finish_compile_and_init_runtime() File "/data2/workspace/neiltian/oneflow/python/oneflow/nn/graph/graph.py", line 1428, in finish_compile_and_init_runtime self._c_nn_graph.init_runtime() oneflow._oneflow_internal.exception.RuntimeError: Error: CUDA out of memory. Tried to allocate 292.4 GB

wangerlie commented 6 months ago

@neiltian-tencent hi, I have used onediff to accelerate Moore-AnimateAnyone demo: /MooreAnimateAnyone/scripts/pose2vid.py
image and it doesn't seem to have any problem
image could you tell me exactlly how do you use onediff to accelerate MooreAnimateAnyone, which model are you trying to accelerate?
It would be greate if you can provide me with your code.

neiltian-tencent commented 6 months ago

convert <class 'function'> failed: Transform failed of <class 'function'>: 'function' object has no attribute '' @wangerlie I compile denoising_unet by oneflow_compile. compile_pipe has no acceleration effect, although the above error is not report. What is the acceleration ratio you tested here?

neiltian-tencent commented 6 months ago

@wangerlie AnimateAnyone pipeline has two unets(denoising_unet and reference_unet). compile_pipe may filter the two unets, this is the onediff filter _PARTS info. _PARTS = [ "text_encoder", "text_encoder_2", "image_encoder", "unet", "controlnet", "fast_unet", # for deepcache "prior", # for StableCascadePriorPipeline "decoder", # for StableCascadeDecoderPipeline "vqgan.down_blocks", # for StableCascadeDecoderPipeline "vqgan.up_blocks", # for StableCascadeDecoderPipeline "vae.decoder", "vae.encoder", ]

neiltian-tencent commented 6 months ago

adding denoising_unet to the _PARTS, the above error is report.

wangerlie commented 6 months ago

convert <class 'function'> failed: Transform failed of <class 'function'>: 'function' object has no attribute '' @wangerlie I compile denoising_unet by oneflow_compile. compile_pipe has no acceleration effect, although the above error is not report. What is the acceleration ratio you tested here?

I use compile_pipe to accelerate the Pose2VideoPipeline and find the same problem that it has no acceleration effect.

wangerlie commented 6 months ago

@wangerlie AnimateAnyone pipeline has two unets(denoising_unet and reference_unet). compile_pipe may filter the two unets, this is the onediff filter _PARTS info. _PARTS = [ "text_encoder", "text_encoder_2", "image_encoder", "unet", "controlnet", "fast_unet", # for deepcache "prior", # for StableCascadePriorPipeline "decoder", # for StableCascadeDecoderPipeline "vqgan.down_blocks", # for StableCascadeDecoderPipeline "vqgan.up_blocks", # for StableCascadeDecoderPipeline "vae.decoder", "vae.encoder", ]

@neiltian-tencent Following your tips I reproduced the reported error and I am working on the problem. Thanks for your patience.

wangerlie commented 4 months ago

the two components 'denoising_unet' and 'reference_unet' can be compiled after add the following code in compile:

def compile_pipe(
    pipe,
    *,
    backend="oneflow",
    options=None,
    ignores=(),
    fuse_qkv_projections=False,
):
    if fuse_qkv_projections:
        pipe = fuse_qkv_projections_in_pipe(pipe)

    if backend == "nexfort" and isinstance(options, str):
        import json

        options = json.loads(options)

    if backend == "nexfort" and options is not None and "memory_format" in options:
        memory_format = getattr(torch, options["memory_format"])
        pipe = convert_pipe_to_memory_format(
            pipe, ignores=ignores, memory_format=memory_format
        )
        del options["memory_format"]

    # To fix the bug of graph load of vae. Please refer to: https://github.com/siliconflow/onediff/issues/452
    if (
        hasattr(pipe, "upcast_vae")
        and pipe.vae.dtype == torch.float16
        and pipe.vae.config.force_upcast
    ):
        pipe.upcast_vae()

    filtered_parts = _filter_parts(ignores=ignores)
    for part in filtered_parts:
        obj = _recursive_getattr(pipe, part, None)
        if obj is not None:
            logger.info(f"Compiling {part}")
            _recursive_setattr(
                pipe, part, compile(obj, backend=backend, options=options)
            )

    if hasattr(pipe, "image_processor") and "image_processor" not in ignores:
        logger.info("Patching image_processor")

        from onediffx.utils.patch_image_processor import (
            patch_image_prcessor as patch_image_prcessor_,
        )

        patch_image_prcessor_(pipe.image_processor)

    if hasattr(pipe,"denoising_unet") and "denoising_unet" not in ignores:
        logger.info("Patching denoising_unet")
        obj = _recursive_getattr(pipe, "denoising_unet", None)
        _recursive_setattr( pipe,"denoising_unet",compile(obj, backend=backend, options=options))

    if hasattr(pipe,"reference_unet") and "reference_unet" not in ignores:
        logger.info("Patching reference_unet")
        obj = _recursive_getattr(pipe, "reference_unet", None)
        _recursive_setattr( pipe, "reference_unet", compile(obj, backend=backend, options=options))
    return pipe

the above problem is raised because of the submod 'BasicTransformerBlock' in 'reference_unet' and 'denoising_unet' can't get proper proxy in the line https://github.com/siliconflow/onediff/blob/a6d2e95ea369b99c50dbde7830c1e254f85433a9/src/onediff/infer_compiler/backends/oneflow/transform/builtin_transform.py#L95

neiltian-tencent commented 4 months ago

@doombeaker I try to find an open source demo. Error related codes: def getattribute(self, attribute) (onediff/src/onediff/infer_compiler/transform/builtin_transform.py) elif attribute in ["forward", "_conv_forward"]: replacement = proxy_class(type(self._oflow_proxy_submod)) return lambda *args, *kwargs: getattr(replacement, attribute)( self, args, **kwargs )

@wangerlie Previous debugging records

strint commented 4 months ago

Thanks for your feedback.

We will move to nexfort backend to avoid convert problems. Please take a look: https://github.com/siliconflow/onediff/tree/main/onediff_diffusers_extensions/examples/sdxl

@neiltian-tencent