Can I auto-tunning SD models by myself?

felixslu commented 1 year ago

with tvm.transform.PassContext(opt_level=3): ex = relax.build(mod_deploy, args.target) args.target： “cuda” pip install -I mlc_ai_nightly_cu121 -f https://mlc.ai/wheels

but,get errors as below!

Traceback (most recent call last): File "web-stable-diffusion/build.py", line 184, in build(mod, ARGS) File "web-stable-diffusion/build.py", line 151, in build ex = relax.build(mod_deploy, args.target) File "/usr/local/lib/python3.8/dist-packages/tvm/relax/vm_build.py", line 338, in build return _vmlink(builder, target, tir_mod, ext_libs, params, system_lib=system_lib) File "/usr/local/lib/python3.8/dist-packages/tvm/relax/vm_build.py", line 242, in _vmlink lib = tvm.build( File "/usr/local/lib/python3.8/dist-packages/tvm/driver/build_module.py", line 281, in build rt_mod_host = _driver_ffi.tir_to_runtime(annotated_mods, target_host) File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.call File "tvm/_ffi/_cython/./packed_func.pxi", line 262, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./packed_func.pxi", line 251, in tvm._ffi._cy3.core.FuncCall3 File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL tvm._ffi.base.TVMError: Traceback (most recent call last): 10: TVMFuncCall 9: _ZN3tvm7runtime13PackedFuncObj9ExtractorINS0_16PackedFuncSubObjIZNS0_15TypedPackedFuncIFNS0_6ModuleERKNS0_3MapINS_6TargetENS_8IRModuleEvvEES7_EE17AssignTypedLambdaINS_UlSB_S7_E4_EEEvT_SsEUlRKNS0_7TVMArgsEPNS0_11TVMRetValueEE_EEE4CallEPKS1_SHSL 8: tvm::TIRToRuntime(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&) 7: tvm::SplitMixedModule(tvm::IRModule, tvm::Target const&, tvm::Target const&) 6: tvm::ApplyPasses(tvm::IRModule, tvm::transform::Sequential) 5: tvm::transform::Pass::operator()(tvm::IRModule) const 4: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const 3: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const 2: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const 1: tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const 0: _ZN3tvm7runtime13PackedFuncObj9ExtractorINS0_16PackedFuncSubObjIZNS0_15TypedPackedFuncIFNS_8IRModuleES5_NS_9transform11PassContextEEE17AssignTypedLambdaIZNS_3tir9transform12VerifyMemoryEvEUlS5_S7_E_EEvT_EUlRKNS0_7TVMArgsEPNS0_11TVMRetValueEE_EEE4CallEPKS1_SFSJ Did you forget to bind? Variable B is directly accessed by host memory (it is not contained in a thread environment or in the function arguments. Variable A is directly accessed by host memory (it is not contained in a thread environment or in the function arguments. Variable matmul is directly accessed by host memory (it is not contained in a thread environment or in the function arguments. Variable matmul is directly accessed by host memory (it is not contained in a thread environment or in the function arguments. Variable matmul is directly accessed by host memory (it is not contained in a thread environment or in the function arguments. File "/workspace/tvm/src/tir/analysis/verify_memory.cc", line 205 RuntimeError: Memory verification failed with the following errors: from tvm.script import tir as T

@T.prim_func def matmul20(A: T.Buffer((T.int64(2), T.int64(256), T.int64(1280)), "float32"), B: T.Buffer((T.int64(1280), T.int64(1280)), "float32"), matmul: T.Buffer((T.int64(2), T.int64(256), T.int64(1280)), "float32")): T.func_attr({"global_symbol": "matmul20", "op_pattern": 4, "target": T.target({"arch": "sm_86", "host": {"keys": ["cpu"], "kind": "llvm", "tag": ""}, "keys": ["cuda", "gpu"], "kind": "cuda", "max_num_threads": 1024, "tag": "", "thread_warp_size": 32}), "tir.noalias": T.bool(True)}) for i0, i1, i2, k in T.grid(2, 256, 1280, 1280): cse_var_2: T.int32 = i0 327680 + i1 1280 cse_var_1: T.int32 = cse_var_2 + i2 matmul_1 = T.Buffer((T.int64(655360),), data=matmul.data) if k == 0: matmul_1[cse_var_1] = T.float32(0) A_1 = T.Buffer((T.int64(655360),), data=A.data) B_1 = T.Buffer((T.int64(1638400),), data=B.data) matmul_1[cse_var_1] = matmul_1[cse_var_1] + A_1[cse_var_2 + k] B_1[k 1280 + i2]

felixslu commented 1 year ago

I wonder whether we could get tunning script from your team to tunning sd models by myself.

such as links below: https://github.com/mlc-ai/mlc-llm/commit/8aeb3dfe9ff07b04331cc0ed6fdc7c3ee384e382#diff-643d01e2455cf9344c3c81c40c42c8d6aad9cd7ad207aa72712c0b1556c2d014 mlc_llm/tuning.py

TigerVersusT commented 1 year ago

@felixslu any solution to this problem? I get the same error.

mlc-ai / web-stable-diffusion

Can I auto-tunning SD models by myself? #39