Closed manishghop closed 10 months ago
@jinchen62 can't assign because not in org but this is what we were discussing you engaging on
Related issue https://github.com/openxla/iree/issues/15665
To repeat this error in binary:
iree-run-module \
--device=local-task \
--module="/nodclouddata/chi/src/SHARK/chatglm.vmfb" \
--function=forward \
--input="1x4xi64=0"
EXEC @forward
result[0]: hal.buffer_view
1x4x65024xf16=[[NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN...][...][...][...]]
Further Debug Step:
iree-compiler 20231211.611
iree-runtime 20231211.611
The original 6.4G chatglm-6b-int4.mlir To generate all the dispatch.mlir for debugging, change the chatglm.py line 170 with
path = shark_module.save_module(
"./",
"chatglm",
extra_args=["--iree-hal-dump-executable-sources-to=/nodclouddata/chi/src/SHARK/nan/dispatch/2"],
debug=debug,
)
Then run python chatglm.py
OR:
/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-embedded-linker-path=/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=/nodclouddata/chi/src/SHARK/nan/dispatch/2/tmp/core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true --iree-hal-dump-executable-sources-to=/nodclouddata/chi/src/SHARK/nan/dispatch/2
After run each dispatch, the NAN issue first come out with module_forward_dispatch_9.mlir To generate the chatglm.vmfb to run module_forward_dispatch_9.mlir
path = shark_module.save_module(
"./",
"chatglm",
extra_args=["--iree-flow-break-dispatch=@forward:9"],
debug=debug,
)
Then run python chatglm.py
Or
/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-embedded-linker-path=/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=/nodclouddata/chi/src/SHARK/nan/dispatch/2/tmp/core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true --iree-flow-break-dispatch=@forward:9
To get the 518.3 MB chatglm.vmfb
To run the chatglm.vmfb for module_forward_dispatch_9.mlir, change the chatglm.py line 170 with
iree-run-module \
--device=local-task \
--module="/nodclouddata/chi/src/SHARK/chatglm.vmfb" \
--function=forward \
--input="1x4xi64=1"
The output is
EXEC @forward
result[0]: hal.buffer_view
4x32x32xf16=[[-NAN 0.142822 0.132812 0.201294 0.0273285 -0.0444946 0.0615845 0.0222778 0.187256 -0.0736694 0.178589 0.141602 0.267578 -0.179321 -0.114014 0.188354 0.59375 0.647949 -0.217773 -0.503906 -0.388672 0.281006 0.0260773 1.03125 -0.384766 -0.44751 -0.357178 0.78125 1.03125 -0.484375 -0.445557 0.597656][-NAN 0.0149841 -0.202393 -0.0646973 -0.0947876 0.0257111 0.152954 -0.312988 0.164185 -0.301025 0.102722 0.269531 0.29541 0.0250854 -0.601562 0.660156 0.949219 0.508301 0.030426 -1.25781 0.320068 0.363281 -0.443359 1.88281 0.0531006 -0.734375 0.106262 0.765625 0.984375 -0.757812 -0.679688 0.396484][-NAN -0.377441 -0.0886841 -0.143555 0.0489197 -0.0198364 0.0590515 0.121643 -0.0209961 -0.157227 0.27124 0.306641 0.174561 -0.405762 -0.613281 0.531738 1.05469 0.535156 0.131714 -1.2959 0.702637 0.287354 -0.252441 2.01562 -0.169434 -0.474609 -0.255859 0.972656 1.05469 -0.734375 -0.722656 0.566406][-NAN 0.406738 0.384521 0.460938 -0.2229 0.0623779 0.00845337 0.126343 -0.142822 -0.105225 -0.104553 0.192627 0.00684357 -0.427246 0.0197601 -0.00178814 0.480713 0.574219 -0.322266 -0.554688 -0.0859985 0.0394897 0.304932 1.54688 -0.246704 -0.396484 -0.257568 0.910156 0.914551 -0.65332 -0.535645 0.605957][-NAN -0.418457 -0.402832 -0.480713 0.0939941 -0.0409241 0.30249 -0.0563049 0.15918 -0.0656738 0.209595 0.157959 0.251953 0.0300598 -0.205811 0.325928 0.308594 0.347656 0.33374 -0.671875 0.43457 0.663086 -0.179443 0.941406 0.197021 -0.231445 -0.310303 0.406006 0.746094 -0.277344 -0.31665 0.263916][-NAN 0.108398 0.976562 0.992188 -0.707031 -0.459229 -0.108398 1.21875 0.566406 1.47656 0.648438 0.671875 0.929688 0.65625 -1.15625 0.376953 0.960938 -0.263672 1.04688 -1.27344 0.28125 0.0374451 -0.219727 1.57812 0.455078 -0.566406 0.194458 0.382812 0.761719 -0.28125 -0.71875 0.18457][-NAN 0.218872 0.198364 0.351562 -0.138916 0.194336 0.00869751 -0.0662231 0.174927 -0.143433 0.121643 0.145752 0.0852051 -0.3396 -0.230835 0.3125 0.400635 0.507812 -0.310547 -0.617188 0.190308 -0.125977 0.181763 1.53906 -0.417969 -0.363037 -0.118103 1 0.707031 -0.621094 -0.65625 0.442871][-NAN 0.242432 -0.172485 -0.0402527 -0.158447 -0.0847778 0.19043 0.000880241 0.300537 -0.0539551 0.269775 0.0354309 0.258057 -0.121643 -0.163818 0.256104 0.828613 0.535156 -0.0917358 -0.554688 0.0227509 0.523438 -0.273438 0.672363 0.153564 -0.202271 -0.429688 0.463379 0.737793 -0.60498 -0.177979 0.758301][-NAN 0.16687 -0.049469 0.0241852 -0.168091 -1.07812 0.953125 1.00781 0.984375 0.851562 1.375 1.01562 1.4375 1.32812 -1.49219 0.667969 1.23438 -0.408203 1.47656 -2.23438 1.17188 0.65625 -0.546875 1.6875 0.976562 -0.550781 1.15625 -0.126953 1 -0.0634766 -1.09375 0.0656738][-NAN 0.0722046 -0.0282593 -0.0578003 0.0189056 -0.0503235 -0.0721436 -0.245239 -0.0542908 -0.30835 0.0949707 0.0890503 0.503418 -0.294922 -0.0880737 -0.102173 0.546387 0.570312 -0.406494 -0.66748 -0.461182 -0.0996704 0.20105 1.55469 -0.293213 -0.543457 -0.464844 0.964844 0.753418 -0.691406 -0.726562 0.550781][-NAN 0.0623474 -0.0394592 -0.0491028 -0.109253 -0.00532532 -0.00837708 -0.0166473 0.0957642 -0.0270538 0.253906 0.443115 0.476562 0.251953 -0.66748 0.351562 1.03125 0.245117 0.32373 -0.913086 0.651855 0.339844 -0.0430298 1.13965 0.253662 -0.443115 -0.400391 0.715332 0.871094 -0.765625 -0.373047 0.976074][-NAN -0.107117 -0.0581665 -0.162354 -0.0953979 -0.192139 0.2771 -0.173218 0.613281 0.0386353 0.486084 0.209839 0.773438 -0.156982 -0.0804443 0.19519 0.91748 0.921875 -0.349854 -0.714355 -0.102722 -0.243286 0.0998535 2.09375 -0.378662 -0.156128 -0.394287 0.515625 0.354004 -0.265869 -0.277588 0.491211][-NAN -0.155396 -0.106873 -0.126221 0.0148773 0.0980835 -0.146851 -0.0118637 0.009552 0.078125 0.221802 -0.013031 -0.110718 -0.176758 -0.128174 0.153442 0.652344 0.566406 -0.490479 -0.496094 0.0133133 -0.204956 -0.100708 1.32031 -0.809082 -0.421631 -0.398682 1.11719 0.663574 -0.898438 -0.554199 0.617188][-NAN -0.137329 0.0624084 0.424072 0.0146179 -1.21875 0.0838623 1.03223 1.51562 1.05566 1.14844 0.859375 1.46094 0.404297 -1.5 0.289307 1.39062 0.367676 1.96875 -1.41406 0.628418 0.296875 -0.174927 1.4375 0.999512 -0.0878906 -0.0773315 0.169678 1.20312 -0.226929 -0.371094 0.211792][-NAN 0.209229 0.213989 0.291016 -0.0252075 0.0462341 -0.00183678 0.0205231 0.103271 -0.152344 0.106689 -0.00227547 0.0877686 -0.335449 0.100769 0.0362244 0.308838 0.496338 -0.285156 -0.449463 -0.365479 0.0715942 0.10907 1.07812 -0.0349426 -0.400146 -0.386719 0.898438 0.839844 -0.542969 -0.5 0.734375][-NAN 0.158203 0.113281 0.0916748 -0.314697 -0.23645 0.371338 0.273682 0.29126 0.233398 0.342041 0.120728 0.359375 0.135864 -0.281494 0.0345154 0.6875 0.695312 -0.478271 -0.777344 -0.150269 0.0703125 0.257812 1 -0.39624 -0.28125 -0.394531 0.808105 1.17188 -0.718262 -0.427734 0.629395][-NAN 0.0286102 0.0839233 -0.0317993 0.0648804 0.0410461 -0.119385 -0.0699463 -0.0432739 -0.0532837 -0.0927734 0.102051 -0.0171661 -0.0993652 -0.523438 0.0148392 -0.135376 -0.485107 -0.266357 -0.870117 -1.28906 1.11035 -1.53906 -1.54785 0.203857 -0.368896 -1.01562 1.50879 1.21777 1.34375 -0.326904 0.0969849][-NAN 0.0119705 0.15625 -0.078125 0.0838623 0.117615 0.0111084 0.0110474 0.114685 -0.0796509 -0.0977783 -0.0163574 -0.0653076 -0.126831 -0.0888672 0.347656 -0.0478821 0.196533 0.474609 0.0300446 0.259277 -0.0598145 -0.271484 -0.63623 0.551758 -1.09375 -0.999512 0.929688 0.520508 1.22559 0.395752 0.431396][-NAN 0.0478516 0.140869 -0.114929 0.174805 0.211914 -0.0474243 -0.017868 0.0183258 -0.271484 -0.0900269 -0.0342712 -0.476318 0.213745 0.401367 0.0577393 0.635742 1.01562 -0.858398 0.175293 0.157837 1.2793 0.0521545 -1.16309 0.342041 -0.451904 -0.809082 1.14746 0.90625 0.898438 0.476318 0.0497742][-NAN 0.0640259 0.0469971 -0.10199 0.17395 -0.0628052 0.169678 0.0995483 -0.0897217 -0.0575867 0.488525 -0.302246 -0.0686646 0.427246 0.589355 -0.454834 0.939941 1.02246 -1.1709 0.296387 -0.130249 1.66309 0.376465 -1.26367 0.287842 -0.722656 -0.776855 1.31934 1.125 0.648438 -0.0335388 -0.0169678][-NAN -0.00254059 0.0898438 -0.0976562 0.28125 -0.069519 -0.115906 0.214966 0.190796 0.26416 0.0657959 0.394043 -0.27124 -0.190796 -0.776855 -0.547852 0.22998 1.0625 -1.14062 0.730469 0.967773 0.613281 -0.394287 -1.19531 -0.0442505 -0.391602 -0.119019 1.125 -0.135498 1.59277 0.535645 0.895508][-NAN -0.0597229 -0.071167 0.0662231 -0.0159149 -0.133789 0.227051 0.224487 0.36499 0.00257683 -0.111328 -0.680176 -0.569824 1.18066 0.566895 -2.87305 -0.145508 1.56348 -2.5 -0.0403442 1.41406 2.31055 0.550293 -1.30469 -0.894531 -0.69873 -1.20215 1.96777 -0.270996 1.58594 1.05371 1.0625][-NAN 0.324951 0.400879 -0.443359 0.0141144 0.854004 -0.370117 -0.634766 -1.17969 -0.693359 -1.47168 -0.571777 0.0400085 1.77441 0.83252 0.213135 2.2207 1.91602 -1.74219 1.85938 -0.00273514 3.35742 0.19751 -1.52441 -1.39062 0.973145 0.28833 1.13281 -0.520508 0.650879 0.53125 0.70166][-NAN 0.447266 0.277344 -0.310303 -0.0130539 0.466797 -0.240845 -0.660156 -0.695312 -0.390625 -1.30469 -0.131836 0.071106 0.244385 -0.296875 0.757324 1.17871 0.486328 -0.820312 0.683105 -0.526367 1.99121 0.130249 -1.24121 -0.4375 0.0282288 -0.688477 1.16406 0.593262 0.910156 -0.0026207 0.203125][-NAN -0.0357056 0.0184784 -0.0187378 0.0130081 -0.0167542 -0.0133667 -0.0304565 0.0339355 0.0802612 0.0080719 -0.25415 -0.120972 0.373535 0.292236 -0.152222 0.428711 0.916992 -0.752441 0.453125 0.504395 0.974121 0.253906 -1.01367 0.0643921 -0.903809 -1.17188 1.40527 0.62207 1.11621 0.479248 0.315674][-NAN 0.0445557 0.259766 -0.143677 0.180664 0.206177 -0.279053 0.078125 0.129028 -0.233643 0.380859 0.255859 -1.125 -0.59668 1.29688 -0.35376 0.746582 1.75684 -1.10938 0.652344 0.0236053 2.10938 0.203125 -1.55469 -0.265869 -0.644531 -0.566895 0.973145 0.291016 0.996582 0.297119 0.574707][-NAN -0.297119 -0.030365 0.026535 0.0256805 0.0462036 -0.316162 -0.402588 0.194458 -0.162231 0.166016 0.32251 0.0821533 -0.155151 0.343994 -0.206421 0.151733 0.558594 -0.289062 -0.202026 -0.00947571 0.878418 -0.558105 -0.788574 0.519531 -0.186401 -0.182739 0.749512 0.341797 0.54248 0.0967407 -0.0722656][-NAN -0.041748 -0.00181484 -0.0288239 0.0518799 0.0551147 0.00549316 0.0603943 -0.00839996 0.123169 -0.226807 -0.19812 -0.0264587 0.171021 0.229248 -0.0961304 0.572754 1.03809 -0.81543 0.386963 0.70752 0.901367 0.213745 -1.0459 0.0230408 -0.837402 -0.825195 1.46875 0.770996 1.15625 0.296387 0.325928][-NAN 0.00265312 0.429688 0.269531 -0.367188 0.609375 -0.0617676 -1.53125 1.35938 -0.357422 -1.375 -0.996094 -0.314209 0.212036 1.23438 0.161865 1.64844 1.75 -1.49219 0.625 0.110413 2.1875 0.318359 -1.09375 -0.294922 0.171143 0.0635986 0.621094 0.199341 0.302734 0.134766 -0.0419617][-NAN -0.198364 -0.183716 0.275391 -0.0667114 -0.0384216 -0.116394 -0.30127 -0.742188 -0.457031 -0.045105 -0.217896 -0.213135 2.15625 0.220825 0.186646 2.90625 0.929688 -1.62402 1.60059 0.0589294 3.01367 0.414307 -1.74902 -0.890137 1.03809 -0.362793 0.901367 -0.253418 1.17188 0.325684 0.621094][-NAN -0.00503922 -0.0176392 -0.0256348 0.00799561 0.0133362 -0.0719604 -0.0431213 0.0621338 0.0771484 -0.192505 0.00991821 0.0211182 0.0186462 0.134644 -0.188232 0.0141068 -0.0140839 -0.167969 -0.714355 0.582031 -0.0226593 0.175415 -0.542969 0.526855 -0.784668 -0.745605 0.664062 1.34375 0.859375 0.0915527 0.0249634][-NAN 0.00349426 -0.0563354 0.0125809 0.0175171 -0.00355339 -0.0689697 0.0123138 0.0610046 -0.0119858 -0.09375 -0.040863 -0.0586548 0.124023 0.381348 -0.253906 0.0563354 0.566895 -0.797363 -0.787598 0.295898 0.540039 0.141357 -1.19629 0.1875 -1.1875 -0.949219 1.31348 1.43555 1.19531 -0.0949707 0.206665]][[...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...]][[...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...]][[...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...]]
In https://gist.github.com/manishghop/529225d5e7e609b679f53fc4272be05c
print("inputsid: ")
print(input_ids) # tensor([[64790, 64792, 36474, 54591]]) torch.Size([1, 4]
@hanhanW Could you provide some guidance on what's going on in dispatch_9 and where to fix in IREE?
Can you help untangle the issue from SHARK? I think we need a simpler repro. The first step could be uploading the MLIR file to somewhere and attach a link to the issue.
The next step is that you can pass --iree-hal-dump-executable-sources-to=/tmp/dump
to iree-compile
. It will dump executables to the path, and please attach the dispatch_9.mlir
to the issue. That will give us a minor repro which is codegen's input.
The input seems to be critical in this issue. So the next step is to generate inputs for the smaller repro. You can follow the tips to get the smaller reproducer.
Note that it will print many values to stderr during executaion if we pass --iree-flow-trace-dispatch-tensors
to iree-compile
. You will want to dump them to a text file. Then you can search NAN
in the log and we will get a smaller repro.
Feel free to ping me if you run into any issues.
@hanhanW Werid, I run the prebuild binary successfully this morning.
iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=/nodclouddata/chi/src/SHARK/nan/dispatch/2/tmp/core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true --iree-flow-break-dispatch=@forward:9 --iree-flow-trace-dispatch-tensors -o /tmp/chatglm9-dispatch-tensors.vmfb
The terminal output: forwar9-dispatch-tensors.txt
The chatglm9-dispatch-tensors.vmfb
iree-run-module \
--device=local-task \
--module="/tmp/chatglm9-dispatch-tensors.vmfb" \
--function=forward \
--input="1x4xi64=1"
The iree-run-module stop here:
=== forward_dispatch_4::forward_dispatch_4_generic_4x4608x64x64_f16 inputs ===
OUT_OF_RANGE; while invoking native function hal.buffer_view.trace; while calling import;
[ 1] native hal.buffer_view.trace:0 -
[ 0] bytecode module@0:4402 -; invoking function 'forward'; `sync func @forward(%input0: tensor<1x4xi64>) -> (%output0: tensor<1x4x65024xf16>)`
I am seeing the error at 6a60b64c69b832f2b8bfab32450f7136f3171509:
❯ build/tools/iree-opt ~/chatglm-6b-int4.mlir
/home/hanchung/chatglm-6b-int4.mlir:0:0: error: attempting to parse a byte at the end of the bytecode
/home/hanchung/chatglm-6b-int4.mlir:0:0: note: in bytecode version 6 produced by: MLIR18.0.0git
It looks like we need to regenerate the mlir file?
The terminal output: forwar9-dispatch-tensors.txt
Based on the log, I think we const-eval a NAN and it becomes an input. So the issue could be at the other dispatch.
=== jit_eval_0_dispatch_0::jit_eval_0_dispatch_0_generic_32_f16 inputs ===
32xf16=0 0.03125 0.0625 0.09375 0.125 0.15625 0.1875 0.21875 0.25 0.28125 0.3125 0.34375 0.375 0.40625 0.4375 0.46875 0.5 0.53125 0.5625 0.59375 0.625 0.65625 0.6875 0.71875 0.75 0.78125 0.8125 0.84375 0.875 0.90625 0.9375 0.96875
=== jit_eval_0_dispatch_0::jit_eval_0_dispatch_0_generic_32_f16 outputs ===
32xf16=-NAN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Are you able to get the dispatch? I think it will show up if you pass -mlir-print-ir-after=iree-flow-annotate-dispatches
and -mlir-elide-elementsattrs-if-larger=4
to the iree-compile
. Can you help extract the dispatch from the log?
I am seeing the error at 6a60b64:
❯ build/tools/iree-opt ~/chatglm-6b-int4.mlir /home/hanchung/chatglm-6b-int4.mlir:0:0: error: attempting to parse a byte at the end of the bytecode /home/hanchung/chatglm-6b-int4.mlir:0:0: note: in bytecode version 6 produced by: MLIR18.0.0git
It looks like we need to regenerate the mlir file?
I try to rerun the chatglm.py with nothing change. It shows the same issue we came across yesterday. Could you download and run it? It should generate the mlir quickly compared to I run it >> download to local system >> upload to google bucket >> you download/upload it again to your vm.
(shark.venv) ➜ SHARK git:(main) ✗ python nan/chatglm.py
........
[DEBUG] Compiling torchscript graph
[DEBUG] Lowering Torch -> Linalg
[DEBUG] Successfully Generated mlir on device
[DEBUG] converting to bytecode
Saved falcon mlir at chatglm-6b-int4.mlir
Compiling for device : cpu-task
Configuring for device:cpu-task
Target triple found:x86_64-linux-gnu
Traceback (most recent call last):
File "/nodclouddata/chi/src/SHARK/nan/chatglm.py", line 170, in <module>
path = shark_module.save_module(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/SHARK/shark/shark_inference.py", line 213, in save_module
return export_iree_module_to_vmfb(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/SHARK/shark/iree_utils/compile_utils.py", line 554, in export_iree_module_to_vmfb
flatbuffer_blob = compile_module_to_flatbuffer(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/SHARK/shark/iree_utils/compile_utils.py", line 338, in compile_module_to_flatbuffer
flatbuffer_blob = ireec.compile_file(
^^^^^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/core.py", line 257, in compile_file
result = invoke_immediate(cl)
^^^^^^^^^^^^^^^^^^^^
File "/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/binaries.py", line 200, in invoke_immediate
raise CompilerToolError(process)
iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile
Error code: -11
Diagnostics:
Please report issues to https://github.com/openxla/iree/issues and include the crash backtrace.
Stack dump:
0. Program arguments: /nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-embedded-linker-path=/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true
#0 0x00007f7dd755fc27 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x50f3c27)
#1 0x00007f7dd755d96e llvm::sys::RunSignalHandlers() (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x50f196e)
#2 0x00007f7dd75602ef SignalHandler(int) Signals.cpp:0:0
#3 0x00007f7dd245d420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
#4 0x00007f7dd85e8531 mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::reassociateDequantMatmul(mlir::RewriterBase&, mlir::linalg::GenericOp, mlir::linalg::GenericOp, int) FuseDequantizationMatmul.cpp:0:0
#5 0x00007f7dd85e4cb6 mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::FuseDequantizationMatmulPass::runOnOperation() FuseDequantizationMatmul.cpp:0:0
#6 0x00007f7dd76e8cf9 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x527ccf9)
#7 0x00007f7dd76e96d8 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x527d6d8)
#8 0x00007f7dd76eb456 mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool) (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x527f456)
#9 0x00007f7dd76e8eec mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x527ceec)
#10 0x00007f7dd76ec7ea mlir::PassManager::run(mlir::Operation*) (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x52807ea)
#11 0x00007f7dd74b8ee9 ireeCompilerInvocationPipeline (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x504cee9)
#12 0x00007f7dd76b12da mlir::iree_compiler::runIreecMain(int, char**)::$_2::operator()(iree_compiler_source_t*) const iree_compile_lib.cc:0:0
#13 0x00007f7dd76b0b97 mlir::iree_compiler::runIreecMain(int, char**) iree_compile_lib.cc:0:0
#14 0x00007f7dd227b083 __libc_start_main /build/glibc-BHL3KM/glibc-2.31/csu/../csu/libc-start.c:342:3
#15 0x000000000020177e _start (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile+0x20177e)
Invoked with:
iree-compile /nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-embedded-linker-path=/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true
Need more information? Set IREE_SAVE_TEMPS=/some/dir in your environment to save all artifacts and reproducers.
I am seeing the error at 6a60b64:
❯ build/tools/iree-opt ~/chatglm-6b-int4.mlir /home/hanchung/chatglm-6b-int4.mlir:0:0: error: attempting to parse a byte at the end of the bytecode /home/hanchung/chatglm-6b-int4.mlir:0:0: note: in bytecode version 6 produced by: MLIR18.0.0git
It looks like we need to regenerate the mlir file?
I just look the chatglm.py code, the mlir is directly generated and saved by torch_mlir.compile. It shouldn't change in different run.
The terminal output: forwar9-dispatch-tensors.txt
Based on the log, I think we const-eval a NAN and it becomes an input. So the issue could be at the other dispatch.
=== jit_eval_0_dispatch_0::jit_eval_0_dispatch_0_generic_32_f16 inputs === 32xf16=0 0.03125 0.0625 0.09375 0.125 0.15625 0.1875 0.21875 0.25 0.28125 0.3125 0.34375 0.375 0.40625 0.4375 0.46875 0.5 0.53125 0.5625 0.59375 0.625 0.65625 0.6875 0.71875 0.75 0.78125 0.8125 0.84375 0.875 0.90625 0.9375 0.96875 === jit_eval_0_dispatch_0::jit_eval_0_dispatch_0_generic_32_f16 outputs === 32xf16=-NAN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Are you able to get the dispatch? I think it will show up if you pass
-mlir-print-ir-after=iree-flow-annotate-dispatches
and-mlir-elide-elementsattrs-if-larger=4
to theiree-compile
. Can you help extract the dispatch from the log?
Here you go. I also give the cmd I run chatglm_dispatch.mlir
iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=/nodclouddata/chi/src/SHARK/nan/dispatch/2/tmp/core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true --iree-flow-break-dispatch=@forward:9 --iree-flow-trace-dispatch-tensors -mlir-print-ir-after=iree-flow-annotate-dispatches -mlir-elide-elementsattrs-if-larger=4 -o /tmp/chatglm9.vmfb
Debug steps with this info:
Manually search jit_eval_0_dispatch_0_generic_32_f16
in chatglm_dispatch.mlir to locate the bug code.
Manually create a mlir with the bug code.
builtin.module {
func.func @jit_eval_0_dispatch_0_generic_32_f16(%arg0: !flow.dispatch.tensor<readonly:tensor<32xf16>> loc("aten::reciprocal"("<eval_with_key>.5":11:17)), %arg1: !flow.dispatch.tensor<writeonly:tensor<32xf16>> loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))) {
%cst = arith.constant 1.000000e+04 : f16 loc(callsite("aten::pow"("<eval_with_key>.5":10:12) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
%cst_0 = arith.constant 0.000000e+00 : f16 loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
%cst_1 = arith.constant 1.000000e+00 : f16 loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
%0 = flow.dispatch.tensor.load %arg0, offsets = [0], sizes = [32], strides = [1] : !flow.dispatch.tensor<readonly:tensor<32xf16>> -> tensor<32xf16> loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
%1 = tensor.empty() : tensor<32xf16> loc(callsite("aten::arange"("<eval_with_key>.5":8:13) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
%2 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%0 : tensor<32xf16>) outs(%1 : tensor<32xf16>) {
^bb0(%in: f16 loc("aten::div"("<eval_with_key>.5":9:10)), %out: f16 loc("aten::reciprocal"("<eval_with_key>.5":11:17))):
%3 = math.powf %cst, %in : f16 loc(callsite("aten::pow"("<eval_with_key>.5":10:12) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
%4 = arith.cmpf one, %3, %cst_0 : f16 loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
cf.assert %4, "unimplemented: tensor with zero element" loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
%5 = arith.divf %cst_1, %3 : f16 loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
linalg.yield %5 : f16 loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
} -> tensor<32xf16> loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
flow.dispatch.tensor.store %2, %arg1, offsets = [0], sizes = [32], strides = [1] : tensor<32xf16> -> !flow.dispatch.tensor<writeonly:tensor<32xf16>> loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
return loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
} loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
} loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
iree-opt
to delete the loc infoThank you! I can reproduce the issue starting with the dispatch.
#map = affine_map<(d0) -> (d0)>
func.func @main(%0: tensor<32xf16>) -> tensor<32xf16>{
%cst = arith.constant 1.000000e+04 : f16
%cst_0 = arith.constant 0.000000e+00 : f16
%cst_1 = arith.constant 1.000000e+00 : f16
%1 = tensor.empty() : tensor<32xf16>
%2 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel"]} ins(%0 : tensor<32xf16>) outs(%1 : tensor<32xf16>) {
^bb0(%in: f16, %out: f16):
%3 = math.powf %cst, %in : f16
%4 = arith.cmpf one, %3, %cst_0 : f16
cf.assert %4, "unimplemented: tensor with zero element"
%5 = arith.divf %cst_1, %3 : f16
linalg.yield %5 : f16
} -> tensor<32xf16>
return %2 : tensor<32xf16>
}
Compile to vmfb: iree-compile --output-format=vm-bytecode --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-cpu=cascadelake --iree-llvmcpu-target-triple=x86_64-unknown-linux-gnu ~/z2.mlir -o /tmp/a.vmfb --iree-llvmcpu-enable-ukernels=all
Run the module: iree-run-module --device=local-sync --module=/tmp/a.vmfb --function=main --input=32xf16="0 0.03125 0.0625 0.09375 0.125 0.15625 0.1875 0.21875 0.25 0.28125 0.3125 0.34375 0.375 0.40625 0.4375 0.46875 0.5 0.53125 0.5625 0.59375 0.625 0.65625 0.6875 0.71875 0.75 0.78125 0.8125 0.84375 0.875 0.90625 0.9375 0.96875"
Then I got the output:
EXEC @main
result[0]: hal.buffer_view
32xf16=-NAN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I am taking a look at the dispatch.
I think there is a bug in PolynomialApproximation pass. We have wrong approximation for math.powf
op.
I strip the dispatch to make it only have a single powf op, e.g.,
#map = affine_map<(d0) -> (d0)>
module {
func.func @main(%arg0: tensor<32xf16>) -> tensor<32xf16> {
%cst = arith.constant 1.000000e+04 : f16
%cst_0 = arith.constant 0.000000e+00 : f16
%cst_1 = arith.constant 1.000000e+00 : f16
%0 = tensor.empty() : tensor<32xf16>
%1 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel"]} ins(%arg0 : tensor<32xf16>) outs(%0 : tensor<32xf16>) {
^bb0(%in: f16, %out: f16):
%2 = math.powf %cst, %in : f16
linalg.yield %2 : f16
} -> tensor<32xf16>
return %1 : tensor<32xf16>
}
}
running with the input returns NAN
and INF
.
❯ build/tools/iree-run-module --device=local-sync --module=/tmp/a.vmfb --function=main --input=32xf16="0 0.03125 0.0625 0.09375 0.125 0.15625 0.1875 0.21875 0.25 0.28125 0.3125 0.34375 0.375 0.40625 0.4375 0.46875 0.5 0.53125 0.5625 0.59375 0.625 0.65625 0.6875 0.71875 0.75 0.78125 0.8125 0.84375 0.875 0.90625 0.9375 0.96875"
EXEC @main
result[0]: hal.buffer_view
32xf16=-NAN INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF
If I comment out the pass, we can get reasonable outputs:
❯ build/tools/iree-run-module --device=local-sync --module=/tmp/a.vmfb --function=main --input=32xf16="0 0.03125 0.0625 0.09375 0.125 0.15625 0.1875 0.21875 0.25 0.28125 0.3125 0.34375 0.375 0.40625 0.4375 0.46875 0.5 0.53125 0.5625 0.59375 0.625 0.65625 0.6875 0.71875 0.75 0.78125 0.8125 0.84375 0.875 0.90625 0.9375 0.96875"
EXEC @main
result[0]: hal.buffer_view
32xf16=1 1.33398 1.77832 2.37109 3.16211 4.21875 5.625 7.5 10 13.3359 17.7812 23.7188 31.625 42.1562 56.2188 75 100 133.375 177.875 237.125 316.25 421.75 562.5 750 1000 1334 1778 2372 3162 4216 5624 7500
The implementation is at https://github.com/llvm/llvm-project/blob/2a9d8caf29ca2b2cf4758db31c64fd20cb5eb3bf/mlir/lib/Dialect/Math/Transforms/ExpandPatterns.cpp#L165-L192
@bviyer @rsuderman can you help review if the approximation is correct?
I have a workaround for the issue: https://github.com/openxla/iree/pull/15927
We can remove the workaround after fixing the polynomial approximation issue.
I have a workaround for the issue: #15927
We can remove the workaround after fixing the polynomial approximation issue.
PYTHON TEST FAIL. Detail is here chatglm_fail_1214.txt
#10 0x00007f7096798990 mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::QuantizedMatmulRewriter::precondition() /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:330:61
#11 0x00007f70967982de mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::reassociateDequantMatmul(mlir::RewriterBase&, mlir::linalg::GenericOp, mlir::linalg::GenericOp, int) /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:767:18
I think you are running into new issues. The mlir file is regenerated, and we can not compile it using IREE main branch. It crashes in FuseDequantizationMatmul.cpp
, @Max191 can you coordinate with @AmosLewis on the crash?
#10 0x00007f7096798990 mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::QuantizedMatmulRewriter::precondition() /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:330:61 #11 0x00007f70967982de mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::reassociateDequantMatmul(mlir::RewriterBase&, mlir::linalg::GenericOp, mlir::linalg::GenericOp, int) /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:767:18
I think you are running into new issues. The mlir file is regenerated, and we can not compile it using IREE main branch. It crashes in
FuseDequantizationMatmul.cpp
, @Max191 can you coordinate with @AmosLewis on the crash?
Downloading the model now. I'll try to repro once it's downloaded. Is there a specific iree-compile
command I should try? Otherwise I'll just use whatever chatglm.py
is doing.
#10 0x00007f7096798990 mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::QuantizedMatmulRewriter::precondition() /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:330:61 #11 0x00007f70967982de mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::reassociateDequantMatmul(mlir::RewriterBase&, mlir::linalg::GenericOp, mlir::linalg::GenericOp, int) /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:767:18
I think you are running into new issues. The mlir file is regenerated, and we can not compile it using IREE main branch. It crashes in
FuseDequantizationMatmul.cpp
, @Max191 can you coordinate with @AmosLewis on the crash?Downloading the model now. I'll try to repro once it's downloaded. Is there a specific
iree-compile
command I should try? Otherwise I'll just use whateverchatglm.py
is doing.
chatglm.py should be enough. It would be better to use chatglm.py to repeat the error locally. It will download the model from huggingface and use torch_mlir.compile to generate and save the mlir model in chatglm-6b-int4.mlir. Then use shark_module.save_module to run the iree-compile. If you look at the chatglm_fail_log_1214.txt line 611 there is an equivalent iree-compile cmd there you can use:
iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-embedded-linker-path=/nodclouddata/chi/src/iree-build/compiler/bindings/python/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true
@AmosLewis I am getting this same error even when generating with chatglm.py:
iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile
Error code: 1
Diagnostics:
chatglm-6b-int4.mlir:0:0: error: attempting to parse a byte at the end of the bytecode
chatglm-6b-int4.mlir:0:0: note: in bytecode version 6 produced by: MLIR18.0.0git
Is there something I need to do other than running the script with ToM SHARK?
@AmosLewis Can you try generating with a fresh venv on ToM shark if you haven't already? We aren't able to reproduce the error you're hitting, and I want to make sure we have the same environment and versions for everything.
@AmosLewis I am getting this same error even when generating with chatglm.py:
iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile Error code: 1 Diagnostics: chatglm-6b-int4.mlir:0:0: error: attempting to parse a byte at the end of the bytecode chatglm-6b-int4.mlir:0:0: note: in bytecode version 6 produced by: MLIR18.0.0git
Is there something I need to do other than running the script with ToM SHARK?
I have seen this error. You can
pip uninstall the iree-compiler and iree-runtimes. Then setup the pythonpath to yout local iree_build/
export PYTHONPATH=/nodclouddata/chi/src/iree-build/compiler/bindings/python:/nodclouddata/chi/src/iree-build/runtime/bindings/python::$PYTHONPATH
The iree commit is on bc0b7d42bbd04b4af0a86eb56556ad8fcc6985a2
. This is to make sure HANHAN's fix of math.power is enabled.
I have listed this info in the comments of chatglm_fail_1214.txt
@AmosLewis Can you try generating with a fresh venv on ToM shark if you haven't already? We aren't able to reproduce the error you're hitting, and I want to make sure we have the same environment and versions for everything.
I have listed the venv and iree version info in the comments of chatglm_fail_1214.txt
@AmosLewis Thanks for pointing me to that info! I was able to reproduce and fix the issue on my side. The quantized matmul reassociation wasn't meant to support f16, but was not failing gracefully. I went ahead and added f16 support with https://github.com/openxla/iree/pull/15964, and I was able to compile the model. Let me know if you still have any issues after picking this.
@AmosLewis Thanks for pointing me to that info! I was able to reproduce and fix the issue on my side. The quantized matmul reassociation wasn't meant to support f16, but was not failing gracefully. I went ahead and added f16 support with #15964, and I was able to compile the model. Let me know if you still have any issues after picking this.
Thanks. I will try your patch on my side. Could you also run the vmfb buy this run_chatglm.py on you side? It tries to run the chatglm-9.vmfb
generated by chatglm.py
With all the previous fix(https://github.com/openxla/iree/pull/15927 and https://github.com/openxla/iree/pull/15964), the compile error fix but the NAN issue still exist.
(shark.venv) ➜ SHARK git:(main) ✗ python nan/run_chatglm.py
tensor([[64790, 64792, 36474, 54591]]) torch.Size([1, 4])
/nodclouddata/chi/src/SHARK/nan/run_chatglm.py:13: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
input_ids = torch.tensor(input_ids).reshape([1, input_id_len])
Loading module /nodclouddata/chi/src/SHARK/chatglm.vmfb...
::: Detailed report (took longer than 2.5s):
+0.8661746978759766ms: get_iree_runtime_config
+20850.444555282593ms: mmap /nodclouddata/chi/src/SHARK/chatglm.vmfb
+20850.829124450684ms: ireert.SystemContext created
+20853.740215301514ms: module initialized
Successfully Loaded vmfb model
inputsid:
tensor([[64790, 64792, 36474, 54591]])
output:
[[[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]]]
Can you triage the issue like what we've done above, and attach the reproducer like https://github.com/openxla/iree/issues/15661#issuecomment-1854720527?
Can you triage the issue like what we've done above, and attach the reproducer like #15661 (comment)?
Here is what I got chatglm_fail_log_dispatch9_1218_with_max_15964.txt. It still break at the dispatch9 but stuck here for about 40mins at INF this time. I append the repeat step in the comments as well.
=== jit_eval_8_dispatch_0::jit_eval_8_dispatch_0_generic_4x4_f32xf16xf16 inputs ===
f32=-INF
f16=0
=== jit_eval_8_dispatch_0::jit_eval_8_dispatch_0_generic_4x4_f32xf16xf16 outputs ===
4x4xf16=[0 -INF -INF -INF][0 0 -INF -INF][0 0 0 -INF][0 0 0 0]
It looks like othe dispatches generate -INF
and pass it to jit_eval_8_dispatch_0
. We should look above log to see where the first NAN/INF is generated. Here is a tip that I can think:
grep -B 5 --max-count=1 -n NAN /path-to-log
grep -B 5 --max-count=1 -n INF /path-to-log
This should navigate you to the first place that generates NAN/INF.
=== jit_eval_8_dispatch_0::jit_eval_8_dispatch_0_generic_4x4_f32xf16xf16 inputs === f32=-INF f16=0 === jit_eval_8_dispatch_0::jit_eval_8_dispatch_0_generic_4x4_f32xf16xf16 outputs === 4x4xf16=[0 -INF -INF -INF][0 0 -INF -INF][0 0 0 -INF][0 0 0 0]
It looks like othe dispatches generate
-INF
and pass it tojit_eval_8_dispatch_0
. We should look above log to see where the first NAN/INF is generated. Here is a tip that I can think:grep -B 5 --max-count=1 -n NAN /path-to-log grep -B 5 --max-count=1 -n INF /path-to-log
This should navigate you to the first place that generates NAN/INF.
(shark.venv) ➜ tmp git:(main) ✗ grep -B 5 --max-count=1 -n NAN ./1218_chatglm_forward9-dispatch-tensors.txt
(shark.venv) ➜ tmp git:(main) ✗ grep -B 5 --max-count=1 -n INF ./1218_chatglm_forward9-dispatch-tensors.txt
53-
54-=== jit_eval_6_dispatch_0::jit_eval_6_dispatch_0_transpose outputs ===
55-f16=0
56-
57-=== jit_eval_8_dispatch_0::jit_eval_8_dispatch_0_generic_4x4_f32xf16xf16 inputs ===
58:f32=-INF
I didn't find the any dispatch output INF to 8. I also try to print the annotation here 1218_chatglm_forward9-dispatch-tensors-annotation.mlir. Then search the jit_eval_8_dispatch_0_generic_4x4_f32xf16xf16.
I know what's happening... This is happening in const-eval stage, so all the inputs for these dispatches are constant data. It means that the frontend generates invalid constants or IREE reads the weights incorrectly. There are two things in my mind:
- Do we add f64->f32 demotion in the frontend?
If the weight is in f64 type, and we can't represent it using f32 type, it could become INF
or -INF
.
- Can you check if the frontend generates valid weight?
If the original weight is invalid, the bug is in the model itself.
- Do we add f64->f32 demotion in the frontend?
If the weight was in f64 type, and we can represent it using f32 type, it could become
INF
or-INF
.
- Can you check if the frontend generates valid weight?
If the original weight is invalid, the bug is in the model itself.
I just elide the input chatglm-6b-int4.mlir
by torch-mlir-opt --mlir-elide-elementsattrs-if-larger=4 chatglm-6b-int4.mlir > chatglm-6b-int4-elide.mlir
and search. Here is the https://storage.googleapis.com/shark-public/chi/iree/chatglm/9/1218/chatglm-6b-int4-elide.mlir. If we search f64 to f32
, there are 57 results to do the demotion. There are tons of f64 to f16
as well. It look like:
%cst_427 = arith.constant 1.000000e-05 : f64
...
%28 = linalg.generic {indexing_maps = [#map11, #map1], iterator_types = ["parallel", "parallel", "parallel"]} ins(%27 : tensor<4x1x1xf32>) outs(%24 : tensor<4x1x1xf32>) {
^bb0(%in: f32, %out: f32):
%2160 = arith.truncf %cst_427 : f64 to f32
%2161 = arith.addf %in, %2160 : f32
linalg.yield %2161 : f32
} -> tensor<4x1x1xf32>
%cst_428 = arith.constant 0.29730177875068026 : f64
...
%78 = linalg.generic {indexing_maps = [#map24, #map8], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%75 : tensor<1x32x4x128xf16>) outs(%74 : tensor<1x32x4x128xf16>) {
^bb0(%in: f16, %out: f16):
%2160 = arith.truncf %cst_428 : f64 to f16
%2161 = arith.mulf %in, %2160 : f16
linalg.yield %2161 : f16
} -> tensor<1x32x4x128xf16>
I have to go. One other thing we can try is adding iree-llvmcpu-use-fast-min-max-ops
flag to iree-compile. I don't know what the inputs are, but maybe they were always NaN/INF and now they are propagated as such.
(we should also rename the flag -- I will take a look tomorrow)
I know what's happening... This is happening in const-eval stage, so all the inputs for these dispatches are constant data. It means that the frontend generates invalid constants or IREE reads the weights incorrectly. There are two things in my mind:
- Do we add f64->f32 demotion in the frontend?
- Can you check if the frontend generates valid weight?
Hello, @hanhanW Could you please tell me where the IREE frontend generates invalid constants, or where IREE reads the weights? I'm a new student and I'm eager to learn about the IREE open-source project by investigating this issue.Thank you for your help! I truly appreciate it as I start exploring the IREE project.
update:
We can run the model without NaN on cascadelake in a clean build. Perhaps it can only be reproduced on haswell CPU. I'm setting up an env on @AmosLewis VM, and see if I can reproduce the issue.
I am able to produce reasonable output even on the same VM, if you don't use --iree-global-opt-enable-quantized-matmul-reassociation
. IMO, the flag is off by default, which means that it is a development flag. That path is not fully tested.
My experiments show that it is the root cause about NaN. It produces NaN only if I added the flag. I don't know why it is added, but can we exclude the flag for now?
I am able to produce reasonable output even on the same VM, if you don't use
--iree-global-opt-enable-quantized-matmul-reassociation
. IMO, the flag is off by default, which means that it is a development flag. That path is not fully tested.My experiments show that it is the root cause about NaN. It produces NaN only if I added the flag. I don't know why it is added, but can we exclude the flag for now?
It looks like we are adding it here in shark https://github.com/nod-ai/SHARK/blob/788cc9157c942a4c6f73e3a85f16b14c9ce4d4d5/shark/iree_utils/compile_utils.py#L46. @dan-garvey @monorimet Can help disable it in shark.
I am able to produce reasonable output even on the same VM, if you don't use
--iree-global-opt-enable-quantized-matmul-reassociation
. IMO, the flag is off by default, which means that it is a development flag. That path is not fully tested. My experiments show that it is the root cause about NaN. It produces NaN only if I added the flag. I don't know why it is added, but can we exclude the flag for now?It looks like we are adding it here in shark https://github.com/nod-ai/SHARK/blob/788cc9157c942a4c6f73e3a85f16b14c9ce4d4d5/shark/iree_utils/compile_utils.py#L46. @dan-garvey @monorimet Can help disable it in shark.
Yeah, we don't want to be adding this flag for anything other than llama2 on CPU. It is needed for llama2 performance, but it is still experimental.
Use shark with this commit https://github.com/nod-ai/SHARK/pull/2047 the NAN issuue should be fix this issue. Could you try @manishghop?
(shark.venv) ➜ nan git:(main) ✗ python run_chatglm.py
/home/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
tensor([[64790, 64792, 36474, 54591]]) torch.Size([1, 4])
/home/chi/src/SHARK/nan/run_chatglm.py:13: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
input_ids = torch.tensor(input_ids).reshape([1, input_id_len])
Loading module chatglm.vmfb...
Successfully Loaded vmfb model
::: Detailed report (took longer than 5.0s):
+0.3094673156738281ms: Load to device: torch.Size([1, 4])
+0.5853176116943359ms: Invoke function: forward
+6925.025939941406ms: Invoke complete
+6925.110816955566ms: Result to host
[[[-10.83 -10.83 0.533 ... -10.84 -10.83 -10.84 ]
[-12.5 -12.52 2.217 ... -12.54 -12.53 -12.51 ]
[ -9.59 -9.59 -0.3699 ... -9.62 -9.62 -9.61 ]
[ -9.586 -9.58 1.07 ... -9.56 -9.58 -9.57 ]]]
related issue https://github.com/openxla/iree/issues/16068
What happened?
I'm able to compile the pytorch model into mlir & then convert the mlir model into vmfb file: I used this code for compilation : https://gist.github.com/manishghop/55c741b5734b6f3fb041111a4b9be695
But while running the inference I get NaN error:
I used this code to run the inference : https://gist.github.com/manishghop/529225d5e7e609b679f53fc4272be05c
Steps to reproduce your issue
What component(s) does this issue relate to?
Runtime
Version information
No response
Additional context
No response