siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.
https://github.com/siliconflow/onediff/wiki
Apache License 2.0
1.57k stars 94 forks source link

Check failed: invalid configuration argument when run StableVideoDiffusionPipeline with big resolution #1018

Open darkbridge opened 1 month ago

darkbridge commented 1 month ago

Describe the bug

A clear and concise description of what the bug is.

运行 StableVideoDiffusionPipeline示例时,warmup分辨率指定为1024*576,实际测试时输入图片分辨率为2400*1080,推理时会一直出现Check failed: invalid configuration argument,程序卡住不继续运行,跑小分辨率图是不会出现此问题的。

请问下,是否onediff不支持大分辨率图片?

Your environment

OS

CentOS

OneDiff git commit id

OneFlow version info if you have installed oneflow

Run python -m oneflow --doctor and paste it here. path: ['/home/local/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow'] version: 0.9.1.dev20240515+cu122 git_commit: ec7b682 cmake_build_type: Release rdma: True mlir: True enterprise: False

How To Reproduce

Steps to reproduce the behavior(code or script):

The complete error message

Additional context

Add any other context about the problem here.

strint commented 1 month ago

你跑的例子是哪个?

invalid configuration argument 印象中不是 ondiff 里面报的错误,可以给下更完整的错误栈

darkbridge commented 1 month ago

F20240719 16:29:14.029467 120179 cutlass_conv_tuner_impl.cpp:123] Check failed: cudaEventSynchronize(end) : an illegal memory access was encountered (700) Check failure stack trace: @ 0x7fa1532751ca google::LogMessage::Fail() @ 0x7fa153278101 google::LogMessage::SendToLog() @ 0x7fa153274cf9 google::LogMessage::Flush() @ 0x7fa1532789e9 google::LogMessageFatal::~LogMessageFatal() @ 0x7fa14bbc0a3e oneflow::CutlassConvTunerImpl<>::Find() @ 0x7fa14baf4902 oneflow::CutlassConv2dEngine::Init() @ 0x7fa14baeaf41 oneflow::Conv2dEngineMgr::GetConv2dEngine() @ 0x7fa14a5e61ab _ZZNK7oneflow12_GLOBAL__N_122Conv2dTuningWarmupPass5ApplyEPNS_3JobEPNS_10JobPassCtxEENKUlPKNS_6OpNodeEE1clES8 @ 0x7fa14a5e7cce oneflow::(anonymous namespace)::Conv2dTuningWarmupPass::Apply() @ 0x7fa14a415e74 _ZZN7oneflow23LazyJobBuildAndInferCtx8CompleteEvENKUlRKSsiE2_clES2_i @ 0x7fa14a41b59d oneflow::LazyJobBuildAndInferCtx::Complete() @ 0x7fa247125166 oneflow::CurJobBuildAndInferCtx_Complete() @ 0x7fa247125fbb (unknown) @ 0x7fa246e7ab48 (unknown) @ 0x561241833fa4 cfunction_call @ 0x5612417f65d4 _PyObject_MakeTpCall.localalias.3 @ 0x561241899d75 _PyEval_EvalFrameDefault @ 0x561241843742 _PyEval_Vector @ 0x561241843c9b method_vectorcall @ 0x5612417fc03b _PyObject_Call.localalias.1 @ 0x561241897774 _PyEval_EvalFrameDefault @ 0x561241843742 _PyEval_Vector @ 0x561241843c9b method_vectorcall @ 0x5612417fc03b _PyObject_Call.localalias.1 @ 0x561241897774 _PyEval_EvalFrameDefault @ 0x561241843742 _PyEval_Vector @ 0x561241843c9b method_vectorcall @ 0x5612417fc03b _PyObject_Call.localalias.1 @ 0x561241897774 _PyEval_EvalFrameDefault @ 0x561241843742 _PyEval_Vector @ 0x561241843c9b method_vectorcall @ 0x5612417fc03b _PyObjectCall.localalias.1 Stack trace (most recent call last): Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/oneflow_internal.cpython-310-x86_64-linux-gnu.so", at 0x7fa246e7ab47, in Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/_oneflow_internal.cpython-310-x86_64-linux-gnu.so", at 0x7fa247125fba, in Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/_oneflow_internal.cpython-310-x86_64-linux-gnu.so", at 0x7fa247125165, in CurJobBuildAndInferCtx_Complete() Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-10b6a2f2.so", at 0x7fa14a41b59c, in LazyJobBuildAndInferCtx::Complete() Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-10b6a2f2.so", at 0x7fa14a415e73, in Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-10b6a2f2.so", at 0x7fa14a5e7ccd, in Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-10b6a2f2.so", at 0x7fa14a5e61aa, in Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-10b6a2f2.so", at 0x7fa14baeaf40, in Conv2dEngineMgr::GetConv2dEngine(ep::CudaStream, Conv2dConfig const&, Conv2dArguement const&, std::string const&) Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-10b6a2f2.so", at 0x7fa14baf4901, in CutlassConv2dEngine::Init(ep::CudaStream, Conv2dConfig const&, Conv2dArguement const&) Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-10b6a2f2.so", at 0x7fa14bbc0a3d, in CutlassConvTunerImpl<cutlass::library::Conv2dConfiguration, cutlass::library::ConvArguments>::Find(ep::CudaStream, cutlass::library::ConvFunctionalKey, cutlass::library::Conv2dConfiguration const&, cutlass::library::ConvArguments const&, void, unsigned long) Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-10b6a2f2.so", at 0x7fa1532789e8, in Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-10b6a2f2.so", at 0x7fa153274cf8, in Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-10b6a2f2.so", at 0x7fa153278100, in Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-10b6a2f2.so", at 0x7fa1532751c9, in Object "/home/work/miniforge3/envs/svd/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-10b6a2f2.so", at 0x7fa1433e82fa, in

Aborted (Signal sent by tkill() 120179 10000)

warpup成功后报以上错误

strint commented 1 month ago

请观察一下报错时的显存占用,看是不是显存满了导致 OOM 了

strint commented 1 month ago

设备型号也请发下,我们可以尝试复现下

strint commented 1 month ago

可以尝试调整下预热方式,先用最大的分辨率做预热,然后后面跑小的分辨率。【建议】

另外一种可以尝试的方法是把环境变量 ONEFLOW_CONV2D_KERNEL_ENABLE_TUNING_WARMUP 设置为 0【不太建议,可能导致分辨率变化时,推理开销变大一些】

peng25zhang commented 1 month ago

设备型号也请发下,我们可以尝试复现下

A800上跑的。onediff是不是会增大内存开销呢,2400*1080分辨率不用onediff能跑,用了onediff后就会出错。 @strint

strint commented 1 month ago

设备型号也请发下,我们可以尝试复现下

A800上跑的。onediff是不是会增大内存开销呢,2400*1080分辨率不用onediff能跑,用了onediff后就会出错。 @strint

我们验证下这个分辨率看看

@marigoold 来安排个

strint commented 1 month ago

@marigoold 可以发下总结

peng25zhang commented 1 month ago

@marigoold 可以发下总结 请问下这个问题能解决吗

marigoold commented 1 month ago

@marigoold 可以发下总结 请问下这个问题能解决吗

您好,这个现象已经找到问题所在,正在修复,您可以使用 export ONEFLOW_CONV2D_KERNEL_ENABLE_TUNING_WARMUP=0 临时应对一下,看看还有没有问题。 另外,如果 vae 编译时候也报错的话,可以在 compile_pipe 里面指定 ignores=["vae"]

peng25zhang commented 1 month ago

@marigoold 可以发下总结 请问下这个问题能解决吗

您好,这个现象已经找到问题所在,正在修复,您可以使用 export ONEFLOW_CONV2D_KERNEL_ENABLE_TUNING_WARMUP=0 临时应对一下,看看还有没有问题。 另外,如果 vae 编译时候也报错的话,可以在 compile_pipe 里面指定 ignores=["vae"]

@marigoold 您好,这个方法试过了,还是不行。

marigoold commented 4 weeks ago

@marigoold 可以发下总结 请问下这个问题能解决吗

您好,这个现象已经找到问题所在,正在修复,您可以使用 export ONEFLOW_CONV2D_KERNEL_ENABLE_TUNING_WARMUP=0 临时应对一下,看看还有没有问题。 另外,如果 vae 编译时候也报错的话,可以在 compile_pipe 里面指定 ignores=["vae"]

@marigoold 您好,这个方法试过了,还是不行。

还是一样的错误吗?

peng25zhang commented 4 weeks ago

@marigoold 可以发下总结 请问下这个问题能解决吗

您好,这个现象已经找到问题所在,正在修复,您可以使用 export ONEFLOW_CONV2D_KERNEL_ENABLE_TUNING_WARMUP=0 临时应对一下,看看还有没有问题。 另外,如果 vae 编译时候也报错的话,可以在 compile_pipe 里面指定 ignores=["vae"]

@marigoold 您好,这个方法试过了,还是不行。

还是一样的错误吗?

@marigoold 是一样的错误。

peng25zhang commented 1 week ago

@strint @marigoold 您好,请问这个问题解决了吗?