lshqqytiger / ZLUDA

CUDA on AMD GPUs
Apache License 2.0
270 stars 35 forks source link

Crash when trying to use training in Applio with Zluda 3.7.2 #28

Closed AznamirWoW closed 1 month ago

AznamirWoW commented 2 months ago

Using Applio 3.2.1

1) modified the install script to use cu118 libraries

pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118

2) downloaded Zluda 3.7.2, patched torch under env/Lib/site-packages/torch/lib using 3 dlls.

3) modified Applio code to add wherever it imports torch torch.backends.cudnn.enabled = False torch.backends.cuda.enable_flash_sdp(False) torch.backends.cuda.enable_math_sdp(True) torch.backends.cuda.enable_mem_efficient_sdp(False)

4) Ran the training process

thread '' panicked at zluda_rtc\src\lib.rs:34:16: [ZLUDA] HIPRTC failed: 11 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace thread '' panicked at library\panic_unwind\src\seh.rs:260:8: Rust panics cannot be copied stack backtrace: 0: 0x7ff83202601a - nvrtcGetLoweredName 1: 0x7ff832032c8b - nvrtcGetLoweredName 2: 0x7ff8320247d1 - nvrtcGetLoweredName 3: 0x7ff832025e06 - nvrtcGetLoweredName 4: 0x7ff83202761f - nvrtcGetLoweredName 5: 0x7ff8320272b7 - nvrtcGetLoweredName 6: 0x7ff832027b5d - nvrtcGetLoweredName 7: 0x7ff8320279db - nvrtcGetLoweredName 8: 0x7ff8320266a9 - nvrtcGetLoweredName 9: 0x7ff8320276d6 - nvrtcGetLoweredName 10: 0x7ff832037527 - nvrtcGetLoweredName 11: 0x7ff83202ab9e - nvrtcGetLoweredName 12: 0x7ff83de02255 - std::_Init_locks::operator= 13: 0x7ff83de01fb8 - std::_Init_locks::operator= 14: 0x7ff83de024c9 - ExceptionPtrCurrentException 15: 0x7ffff58ebeed - THPPointer<_frame>::operator bool 16: 0x7ffff62afd7b - c10d::PythonCommHook::runHook 17: 0x7ff849711080 - 18: 0x7ff8497126a5 - _NLG_Return2 19: 0x7ff84f811c96 - RtlCaptureContext2 20: 0x7ffff58ef44b - c10::ivalue::Future::devices 21: 0x7ff81fe682f6 - cfunction_call at \objects\methodobject.c:543 22: 0x7ff81fe2554c - _PyObject_MakeTpCall at \objects\call.c:215 23: 0x7ff81fe278f1 - method_vectorcall at \objects\classobject.c:83 24: 0x7ff81fe8fc5a - slot_tp_call at \objects\typeobject.c:7497 25: 0x7ff81fe2554c - _PyObject_MakeTpCall at \objects\call.c:215 26: 0x7ff81ff1e6f2 - PyObject_Vectorcall at \include\cpython\abstract.h:123 27: 0x7ff81ff1e6f2 - call_function at \python\ceval.c:5893 28: 0x7ff81ff1a8e2 - _PyEval_EvalFrameDefault at \python\ceval.c:4213 29: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 30: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 31: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 32: 0x7ff81fe277d2 - _PyObject_VectorcallTstate at \include\cpython\abstract.h:114 33: 0x7ff81fe277d2 - method_vectorcall at \objects\classobject.c:53 34: 0x7ff81fe2566d - PyVectorcall_Call at \objects\call.c:267 35: 0x7ff81ff1e8f2 - PyObject_Call at \objects\call.c:317 36: 0x7ff81ff1e8f2 - do_call_core at \python\ceval.c:5945 37: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault at \python\ceval.c:4277 38: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 39: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 40: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 41: 0x7ff81fe277d2 - _PyObject_VectorcallTstate at \include\cpython\abstract.h:114 42: 0x7ff81fe277d2 - method_vectorcall at \objects\classobject.c:53 43: 0x7ff81fe2566d - PyVectorcall_Call at \objects\call.c:267 44: 0x7ff81ff1e8f2 - PyObject_Call at \objects\call.c:317 45: 0x7ff81ff1e8f2 - do_call_core at \python\ceval.c:5945 46: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault at \python\ceval.c:4277 47: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 48: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 49: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 50: 0x7ff81fe25391 - _PyObject_FastCallDictTstate at \objects\call.c:153 51: 0x7ff81fe25ad2 - _PyObject_Call_Prepend at \objects\call.c:431 52: 0x7ff81fe8fc0c - slot_tp_call at \objects\typeobject.c:7494 53: 0x7ff81fe2554c - _PyObject_MakeTpCall at \objects\call.c:215 54: 0x7ff81ff1e6f2 - PyObject_Vectorcall at \include\cpython\abstract.h:123 55: 0x7ff81ff1e6f2 - call_function at \python\ceval.c:5893 56: 0x7ff81ff1af08 - _PyEval_EvalFrameDefault at \python\ceval.c:4231 57: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 58: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 59: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 60: 0x7ff81fe277d2 - _PyObject_VectorcallTstate at \include\cpython\abstract.h:114 61: 0x7ff81fe277d2 - method_vectorcall at \objects\classobject.c:53 62: 0x7ff81fe2566d - PyVectorcall_Call at \objects\call.c:267 63: 0x7ff81ff1e8f2 - PyObject_Call at \objects\call.c:317 64: 0x7ff81ff1e8f2 - do_call_core at \python\ceval.c:5945 65: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault at \python\ceval.c:4277 66: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 67: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 68: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 69: 0x7ff81fe277d2 - _PyObject_VectorcallTstate at \include\cpython\abstract.h:114 70: 0x7ff81fe277d2 - method_vectorcall at \objects\classobject.c:53 71: 0x7ff81fe2566d - PyVectorcall_Call at \objects\call.c:267 72: 0x7ff81ff1e8f2 - PyObject_Call at \objects\call.c:317 73: 0x7ff81ff1e8f2 - do_call_core at \python\ceval.c:5945 74: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault at \python\ceval.c:4277 75: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 76: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 77: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 78: 0x7ff81fe25391 - _PyObject_FastCallDictTstate at \objects\call.c:153 79: 0x7ff81fe25ad2 - _PyObject_Call_Prepend at \objects\call.c:431 80: 0x7ff81fe8fc0c - slot_tp_call at \objects\typeobject.c:7494 81: 0x7ff81fe2554c - _PyObject_MakeTpCall at \objects\call.c:215 82: 0x7ff81ff1e6f2 - PyObject_Vectorcall at \include\cpython\abstract.h:123 83: 0x7ff81ff1e6f2 - call_function at \python\ceval.c:5893 84: 0x7ff81ff1af08 - _PyEval_EvalFrameDefault at \python\ceval.c:4231 85: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 86: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 87: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 88: 0x7ff81fe27669 - _PyObject_VectorcallTstate at \include\cpython\abstract.h:114 89: 0x7ff81fe278f1 - method_vectorcall at \objects\classobject.c:83 90: 0x7ff81ff1e8f2 - PyObject_Call at \objects\call.c:317 91: 0x7ff81ff1e8f2 - do_call_core at \python\ceval.c:5945 92: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault at \python\ceval.c:4277 93: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 94: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 95: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 96: 0x7ff81fe27669 - _PyObject_VectorcallTstate at \include\cpython\abstract.h:114 97: 0x7ff81fe278f1 - method_vectorcall at \objects\classobject.c:83 98: 0x7ff81ff1e8f2 - PyObject_Call at \objects\call.c:317 99: 0x7ff81ff1e8f2 - do_call_core at \python\ceval.c:5945 100: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault at \python\ceval.c:4277 101: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 102: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 103: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 104: 0x7ff81fe253b4 - _PyObject_FastCallDictTstate at \objects\call.c:142 105: 0x7ff81fe25ad2 - _PyObject_Call_Prepend at \objects\call.c:431 106: 0x7ff81fe8fc0c - slot_tp_call at \objects\typeobject.c:7494 107: 0x7ff81fe257a7 - _PyObject_Call at \objects\call.c:305 108: 0x7ff81ff1e8f2 - PyObject_Call at \objects\call.c:317 109: 0x7ff81ff1e8f2 - do_call_core at \python\ceval.c:5945 110: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault at \python\ceval.c:4277 111: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 112: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 113: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 114: 0x7ff81fe27669 - _PyObject_VectorcallTstate at \include\cpython\abstract.h:114 115: 0x7ff81fe278f1 - method_vectorcall at \objects\classobject.c:83 116: 0x7ff81ff1e8f2 - PyObject_Call at \objects\call.c:317 117: 0x7ff81ff1e8f2 - do_call_core at \python\ceval.c:5945 118: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault at \python\ceval.c:4277 119: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 120: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 121: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 122: 0x7ff81fe27669 - _PyObject_VectorcallTstate at \include\cpython\abstract.h:114 123: 0x7ff81fe278f1 - method_vectorcall at \objects\classobject.c:83 124: 0x7ff81ff1e8f2 - PyObject_Call at \objects\call.c:317 125: 0x7ff81ff1e8f2 - do_call_core at \python\ceval.c:5945 126: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault at \python\ceval.c:4277 127: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 128: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 129: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 130: 0x7ff81fe27669 - _PyObject_VectorcallTstate at \include\cpython\abstract.h:114 131: 0x7ff81fe278f1 - method_vectorcall at \objects\classobject.c:83 132: 0x7ff81ff1e8f2 - PyObject_Call at \objects\call.c:317 133: 0x7ff81ff1e8f2 - do_call_core at \python\ceval.c:5945 134: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault at \python\ceval.c:4277 135: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 136: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 137: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 138: 0x7ff81fe253b4 - _PyObject_FastCallDictTstate at \objects\call.c:142 139: 0x7ff81fe25ad2 - _PyObject_Call_Prepend at \objects\call.c:431 140: 0x7ff81fe8fc0c - slot_tp_call at \objects\typeobject.c:7494 141: 0x7ff81fe2554c - _PyObject_MakeTpCall at \objects\call.c:215 142: 0x7ff81ff1e6f2 - PyObject_Vectorcall at \include\cpython\abstract.h:123 143: 0x7ff81ff1e6f2 - call_function at \python\ceval.c:5893 144: 0x7ff81ff1a8e2 - _PyEval_EvalFrameDefault at \python\ceval.c:4213 145: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 146: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 147: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 148: 0x7ff81ff161a9 - _PyObject_VectorcallTstate at \include\cpython\abstract.h:114 149: 0x7ff81ff1e6f2 - PyObject_Vectorcall at \include\cpython\abstract.h:123 150: 0x7ff81ff1e6f2 - call_function at \python\ceval.c:5893 151: 0x7ff81ff1a8e2 - _PyEval_EvalFrameDefault at \python\ceval.c:4213 152: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 153: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 154: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 155: 0x7ff81ff1e8f2 - PyObject_Call at \objects\call.c:317 156: 0x7ff81ff1e8f2 - do_call_core at \python\ceval.c:5945 157: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault at \python\ceval.c:4277 158: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 159: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 160: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 161: 0x7ff81ff161a9 - _PyObject_VectorcallTstate at \include\cpython\abstract.h:114 162: 0x7ff81ff1e6f2 - PyObject_Vectorcall at \include\cpython\abstract.h:123 163: 0x7ff81ff1e6f2 - call_function at \python\ceval.c:5893 164: 0x7ff81ff1aeb5 - _PyEval_EvalFrameDefault at \python\ceval.c:4198 165: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 166: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 167: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 168: 0x7ff81ff161a9 - _PyObject_VectorcallTstate at \include\cpython\abstract.h:114 169: 0x7ff81ff1e6f2 - PyObject_Vectorcall at \include\cpython\abstract.h:123 170: 0x7ff81ff1e6f2 - call_function at \python\ceval.c:5893 171: 0x7ff81ff1aeb5 - _PyEval_EvalFrameDefault at \python\ceval.c:4198 172: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 173: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 174: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 175: 0x7ff81ff161a9 - _PyObject_VectorcallTstate at \include\cpython\abstract.h:114 176: 0x7ff81ff1e6f2 - PyObject_Vectorcall at \include\cpython\abstract.h:123 177: 0x7ff81ff1e6f2 - call_function at \python\ceval.c:5893 178: 0x7ff81ff1a8e2 - _PyEval_EvalFrameDefault at \python\ceval.c:4213 179: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 180: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 181: 0x7ff81fe2585e - _PyFunction_Vectorcall at \objects\call.c:347 182: 0x7ff81ff161a9 - _PyObject_VectorcallTstate at \include\cpython\abstract.h:114 183: 0x7ff81ff1e6f2 - PyObject_Vectorcall at \include\cpython\abstract.h:123 184: 0x7ff81ff1e6f2 - call_function at \python\ceval.c:5893 185: 0x7ff81ff1af08 - _PyEval_EvalFrameDefault at \python\ceval.c:4231 186: 0x7ff81ff1ce7b - _PyEval_EvalFrame at \include\internal\pycore_ceval.h:46 187: 0x7ff81ff1ce7b - _PyEval_Vector at \python\ceval.c:5067 188: 0x7ff81ff178c2 - PyEval_EvalCode at \python\ceval.c:1134 189: 0x7ff81ff8e08e - run_eval_code_obj at \python\pythonrun.c:1291 190: 0x7ff81ff8e168 - run_mod at \python\pythonrun.c:1312 191: 0x7ff81ff8dc39 - PyRun_StringFlags at \python\pythonrun.c:1183 192: 0x7ff81ff8c21b - PyRun_SimpleStringFlags at \python\pythonrun.c:503 193: 0x7ff81fda8ef7 - pymain_run_command at \modules\main.c:252 194: 0x7ff81fda8ef7 - pymain_run_python at \modules\main.c:582 195: 0x7ff81fda9e93 - Py_RunMain at \modules\main.c:670 196: 0x7ff81fda9e93 - pymain_main at \modules\main.c:1066 197: 0x7ff81fda9f06 - Py_Main at \modules\main.c:1078 198: 0x7ff694f11494 - invoke_main at d:\agent_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:90 199: 0x7ff694f11494 - scrt_common_main_seh at d:\agent_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288 200: 0x7ff84ea37374 - BaseThreadInitThunk 201: 0x7ff84f7bcc91 - RtlUserThreadStart

AznamirWoW commented 2 months ago

found that removing "@torch.jit.script" decorator prevents the crash

image

lshqqytiger commented 1 month ago

34