Closed neiltian-tencent closed 4 months ago
could you tell me, which model are you trying to convert by onediff?
it may be caused some closure feature of python
@doombeaker I try to find an open source demo. Error related codes: def getattribute(self, attribute) (onediff/src/onediff/infer_compiler/transform/builtin_transform.py) elif attribute in ["forward", "_conv_forward"]: replacement = proxy_class(type(self._oflow_proxy_submod)) return lambda *args, *kwargs: getattr(replacement, attribute)( self, args, **kwargs )
@doombeaker https://github.com/MooreThreads/Moore-AnimateAnyone demo has similar errors.
@doombeaker could you tell me, how to fix this issue? first of all, do i need to redefine the corresponding oneflow class and register it based on the register interface? https://github.com/MooreThreads/Moore-AnimateAnyone/tree/master/src/models
error info: TemporalBasicTransformerBlock module's forward interface is hijacked by hacked_basic_transformer_inner_forward.(https://github.com/MooreThreads/Moore-AnimateAnyone/blob/master/src/models/mutual_self_attention.py)
does oneflow support this situation? @strint @doombeaker
@strint @doombeaker TemporalBasicTransformerBlock module's forward is not hijacked(reference_attn is False), oneflow can execute the compile process despite the following error. (https://github.com/MooreThreads/Moore-AnimateAnyone/blob/master/src/models/mutual_self_attention.py) ERROR run got error: <class 'oneflow._oneflow_internal.exception.Exception'> Cannot find the kernel matching Current OperatorConf. The Info of OperatorConf are op_name: model.up_blocks.0.upsamplers.0-upsample_nearest_3d-1965 op_type_name: upsample_nearest_3d DeviceType_Name: kCUDA DataType_Name of x_0: kFloat16 DataType_Name of y_0: kFloat16 File "oneflow/core/job/job_interpreter.cpp", line 326, in InterpretJob RunNormalOp(launch_context, launch_op, inputs) File "oneflow/core/job/job_interpreter.cpp", line 238, in RunNormalOp it.Apply(op, inputs, &outputs, OpExprInterpContext(empty_attr_map, JUST(launch_op.device))) File "oneflow/core/framework/op_interpreter/eager_local_op_interpreter.cpp", line 155, in NaiveInterpret PhysicalRun([&](InstructionsBuilder builder) -> Maybe ... output_eager_blob_objects), ctx, result->stream()); }) File "oneflow/core/framework/instructions_builder.h", line 168, in PhysicalRun Build(&instructions_builder) File "oneflow/core/framework/instructions_builder.cpp", line 400, in Call vm::OpCallInstructionPolicy::New( vm_stream, opkernel ... global_tensor_infer_result, ctx, *one::CurrentDevVmDepObjectConsumeMode()) File "oneflow/core/vm/op_call_instruction_policy.h", line 50, in New ptr->Init() File "oneflow/user/kernels/stateful_opkernel.cpp", line 920, in ChooseOpKernel user_op::UserOpRegistryMgr::Get().GetOpKernelRegistryResult(op_type_name, reg_ctx) Error Type: oneflow.ErrorProto.op_kernel_not_found_error
@wangerlie is working on it, and will let you know if there is any progress
hey,are you trying to use the onediff to accelerate the animate anyone? i'm also working on it. and i find that the torch2.2.0 + torch.compile + cudnn8.6 will speed up animate anyone about 30%. but it is really weird that if i run the pipeline consecutively twice,the second result is completely the same as the first result even if i change the reference image of second execution.
so i try to use onediff, and Transform failed of <class 'function'>: 'function' object has no attribute '<locals>'
error happens
hey,are you trying to use the onediff to accelerate the animate anyone? i'm also working on it. and i find that the torch2.2.0 + torch.compile + cudnn8.6 will speed up animate anyone about 30%. but it is really weird that if i run the pipeline consecutively twice,the second result is completely the same as the first result even if i change the reference image of second execution.
so i try to use onediff, and
Transform failed of <class 'function'>: 'function' object has no attribute '<locals>'
error happens
@zhangvia torch.compile dynamic=True?
my denoising unet takes fixed shape latent as input, and i don't use torch.compile to compile the referencenet,because the pipeline only run referencenet once when generate video. and i do try to set dynamic = True. the bug disappear,but the pipeline will cost more memory than dynamic=false
sorry the bug is still here when set dynamic=true. but if i set os.environ['TORCH_LOGS']="recompiles", the bug disappear
@strint @doombeaker TemporalBasicTransformerBlock module's forward is not hijacked(reference_attn is False), oneflow can execute the compile process despite the following error. (https://github.com/MooreThreads/Moore-AnimateAnyone/blob/master/src/models/mutual_self_attention.py) ERROR run got error: <class 'oneflow._oneflow_internal.exception.Exception'> Cannot find the kernel matching Current OperatorConf. The Info of OperatorConf are op_name: model.up_blocks.0.upsamplers.0-upsample_nearest_3d-1965 op_type_name: upsample_nearest_3d DeviceType_Name: kCUDA DataType_Name of x_0: kFloat16 DataType_Name of y_0: kFloat16 File "oneflow/core/job/job_interpreter.cpp", line 326, in InterpretJob RunNormalOp(launch_context, launch_op, inputs) File "oneflow/core/job/job_interpreter.cpp", line 238, in RunNormalOp it.Apply(_op, inputs, &outputs, OpExprInterpContext(empty_attr_map, JUST(launch_op.device))) File "oneflow/core/framework/op_interpreter/eager_local_opinterpreter.cpp", line 155, in NaiveInterpret PhysicalRun([&](InstructionsBuilder builder) -> Maybe ... output_eager_blob_objects), ctx, result->stream()); }) File "oneflow/core/framework/instructions_builder.h", line 168, in PhysicalRun Build(&instructions_builder) File "oneflow/core/framework/instructions_builder.cpp", line 400, in Call vm::OpCallInstructionPolicy::New( vm_stream, opkernel ... global_tensor_infer_result, ctx, *one::CurrentDevVmDepObjectConsumeMode()) File "oneflow/core/vm/op_call_instruction_policy.h", line 50, in New ptr->Init() File "oneflow/user/kernels/stateful_opkernel.cpp", line 920, in ChooseOpKernel user_op::UserOpRegistryMgr::Get().GetOpKernelRegistryResult(op_type_name, reg_ctx) Error Type: oneflow.ErrorProto.op_kernel_not_found_error
@doombeaker @wangerlie The upsample_nearest_3d error is due to missing registration of the half implementation. After adding relevant implementations, Error: CUDA out of memory. File "/data2/workspace/neiltian/onediff/src/onediff/infer_compiler/oneflow/utils.py", line 21, in wrapper return func(self, *args, kwargs) File "/data2/workspace/neiltian/onediff/src/onediff/infer_compiler/utils/graph_management_utils.py", line 91, in wrapper ret = func(self, *args, *kwargs) File "/data2/workspace/neiltian/onediff/src/onediff/infer_compiler/oneflow/deployable_module.py", line 99, in forward output = dpl_graph(args, kwargs) File "/data2/workspace/neiltian/oneflow/python/oneflow/nn/graph/graph.py", line 281, in call return self._dynamic_input_graph_cache(*args, kwargs) File "/data2/workspace/neiltian/oneflow/python/oneflow/nn/graph/cache.py", line 115, in call return graph(*args, *kwargs) File "/data2/workspace/neiltian/oneflow/python/oneflow/nn/graph/graph.py", line 284, in call self._compile(args, kwargs) File "/data2/workspace/neiltian/oneflow/python/oneflow/nn/graph/graph.py", line 852, in _compile return self._compile_new(*args, **kwargs) File "/data2/workspace/neiltian/oneflow/python/oneflow/nn/graph/graph.py", line 876, in _compile_new self.finish_compile_and_init_runtime() File "/data2/workspace/neiltian/oneflow/python/oneflow/nn/graph/graph.py", line 1428, in finish_compile_and_init_runtime self._c_nn_graph.init_runtime() oneflow._oneflow_internal.exception.RuntimeError: Error: CUDA out of memory. Tried to allocate 292.4 GB
@neiltian-tencent hi, I have used onediff to accelerate Moore-AnimateAnyone demo: /MooreAnimateAnyone/scripts/pose2vid.py
and it doesn't seem to have any problem
could you tell me exactlly how do you use onediff to accelerate MooreAnimateAnyone, which model are you trying to accelerate?
It would be greate if you can provide me with your code.
convert <class 'function'> failed: Transform failed of <class 'function'>: 'function' object has no attribute '
@wangerlie AnimateAnyone pipeline has two unets(denoising_unet and reference_unet). compile_pipe may filter the two unets, this is the onediff filter _PARTS info. _PARTS = [ "text_encoder", "text_encoder_2", "image_encoder", "unet", "controlnet", "fast_unet", # for deepcache "prior", # for StableCascadePriorPipeline "decoder", # for StableCascadeDecoderPipeline "vqgan.down_blocks", # for StableCascadeDecoderPipeline "vqgan.up_blocks", # for StableCascadeDecoderPipeline "vae.decoder", "vae.encoder", ]
adding denoising_unet to the _PARTS, the above error is report.
convert <class 'function'> failed: Transform failed of <class 'function'>: 'function' object has no attribute '' @wangerlie I compile denoising_unet by oneflow_compile. compile_pipe has no acceleration effect, although the above error is not report. What is the acceleration ratio you tested here?
I use compile_pipe
to accelerate the Pose2VideoPipeline
and find the same problem that it has no acceleration effect.
@wangerlie AnimateAnyone pipeline has two unets(denoising_unet and reference_unet). compile_pipe may filter the two unets, this is the onediff filter _PARTS info. _PARTS = [ "text_encoder", "text_encoder_2", "image_encoder", "unet", "controlnet", "fast_unet", # for deepcache "prior", # for StableCascadePriorPipeline "decoder", # for StableCascadeDecoderPipeline "vqgan.down_blocks", # for StableCascadeDecoderPipeline "vqgan.up_blocks", # for StableCascadeDecoderPipeline "vae.decoder", "vae.encoder", ]
@neiltian-tencent Following your tips I reproduced the reported error and I am working on the problem. Thanks for your patience.
the two components 'denoising_unet' and 'reference_unet' can be compiled after add the following code in compile:
def compile_pipe(
pipe,
*,
backend="oneflow",
options=None,
ignores=(),
fuse_qkv_projections=False,
):
if fuse_qkv_projections:
pipe = fuse_qkv_projections_in_pipe(pipe)
if backend == "nexfort" and isinstance(options, str):
import json
options = json.loads(options)
if backend == "nexfort" and options is not None and "memory_format" in options:
memory_format = getattr(torch, options["memory_format"])
pipe = convert_pipe_to_memory_format(
pipe, ignores=ignores, memory_format=memory_format
)
del options["memory_format"]
# To fix the bug of graph load of vae. Please refer to: https://github.com/siliconflow/onediff/issues/452
if (
hasattr(pipe, "upcast_vae")
and pipe.vae.dtype == torch.float16
and pipe.vae.config.force_upcast
):
pipe.upcast_vae()
filtered_parts = _filter_parts(ignores=ignores)
for part in filtered_parts:
obj = _recursive_getattr(pipe, part, None)
if obj is not None:
logger.info(f"Compiling {part}")
_recursive_setattr(
pipe, part, compile(obj, backend=backend, options=options)
)
if hasattr(pipe, "image_processor") and "image_processor" not in ignores:
logger.info("Patching image_processor")
from onediffx.utils.patch_image_processor import (
patch_image_prcessor as patch_image_prcessor_,
)
patch_image_prcessor_(pipe.image_processor)
if hasattr(pipe,"denoising_unet") and "denoising_unet" not in ignores:
logger.info("Patching denoising_unet")
obj = _recursive_getattr(pipe, "denoising_unet", None)
_recursive_setattr( pipe,"denoising_unet",compile(obj, backend=backend, options=options))
if hasattr(pipe,"reference_unet") and "reference_unet" not in ignores:
logger.info("Patching reference_unet")
obj = _recursive_getattr(pipe, "reference_unet", None)
_recursive_setattr( pipe, "reference_unet", compile(obj, backend=backend, options=options))
return pipe
the above problem is raised because of the submod 'BasicTransformerBlock' in 'reference_unet' and 'denoising_unet' can't get proper proxy in the line https://github.com/siliconflow/onediff/blob/a6d2e95ea369b99c50dbde7830c1e254f85433a9/src/onediff/infer_compiler/backends/oneflow/transform/builtin_transform.py#L95
@doombeaker I try to find an open source demo. Error related codes: def getattribute(self, attribute) (onediff/src/onediff/infer_compiler/transform/builtin_transform.py) elif attribute in ["forward", "_conv_forward"]: replacement = proxy_class(type(self._oflow_proxy_submod)) return lambda *args, *kwargs: getattr(replacement, attribute)( self, args, **kwargs )
@wangerlie Previous debugging records
Thanks for your feedback.
We will move to nexfort backend to avoid convert problems. Please take a look: https://github.com/siliconflow/onediff/tree/main/onediff_diffusers_extensions/examples/sdxl
@neiltian-tencent
debug info:
error info: onediff/src/onediff/infer_compiler/transform/builtin_transform.py:233 - convert <class 'function'> failed: Transform failed of <class 'function'>: 'function' object has no attribute ''
funciton info:
<function ProxySubmodule.getattribute at 0x7fedb847fe20>