iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.83k stars 611 forks source link

Numerical issue with chatglm2.vmfb model #15661

Closed manishghop closed 10 months ago

manishghop commented 11 months ago

What happened?

I'm able to compile the pytorch model into mlir & then convert the mlir model into vmfb file: I used this code for compilation : https://gist.github.com/manishghop/55c741b5734b6f3fb041111a4b9be695

But while running the inference I get NaN error:

image

I used this code to run the inference : https://gist.github.com/manishghop/529225d5e7e609b679f53fc4272be05c

Steps to reproduce your issue

  1. git clone https://github.com/nod-ai/SHARK.git
  2. cd SHARK
  3. Run the following in Powershell 3.1. set-executionpolicy remotesigned 3.2. Run the setup_venv.ps1 from: https://github.com/nod-ai/SHARK

What component(s) does this issue relate to?

Runtime

Version information

No response

Additional context

No response

stellaraccident commented 11 months ago

@jinchen62 can't assign because not in org but this is what we were discussing you engaging on

AmosLewis commented 11 months ago

Related issue https://github.com/openxla/iree/issues/15665

AmosLewis commented 11 months ago

To repeat this error in binary:

iree-run-module \
    --device=local-task \
    --module="/nodclouddata/chi/src/SHARK/chatglm.vmfb" \
    --function=forward \
    --input="1x4xi64=0"
EXEC @forward
result[0]: hal.buffer_view
1x4x65024xf16=[[NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN...][...][...][...]]
AmosLewis commented 11 months ago

Further Debug Step:

iree-compiler             20231211.611
iree-runtime              20231211.611

The original 6.4G chatglm-6b-int4.mlir To generate all the dispatch.mlir for debugging, change the chatglm.py line 170 with

path = shark_module.save_module(
    "./",
    "chatglm",
    extra_args=["--iree-hal-dump-executable-sources-to=/nodclouddata/chi/src/SHARK/nan/dispatch/2"],
    debug=debug,
)

Then run python chatglm.py OR: /nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-embedded-linker-path=/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=/nodclouddata/chi/src/SHARK/nan/dispatch/2/tmp/core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true --iree-hal-dump-executable-sources-to=/nodclouddata/chi/src/SHARK/nan/dispatch/2

After run each dispatch, the NAN issue first come out with module_forward_dispatch_9.mlir To generate the chatglm.vmfb to run module_forward_dispatch_9.mlir

path = shark_module.save_module(
    "./",
    "chatglm",
    extra_args=["--iree-flow-break-dispatch=@forward:9"],
    debug=debug,
)

Then run python chatglm.py Or /nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-embedded-linker-path=/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=/nodclouddata/chi/src/SHARK/nan/dispatch/2/tmp/core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true --iree-flow-break-dispatch=@forward:9 To get the 518.3 MB chatglm.vmfb

To run the chatglm.vmfb for module_forward_dispatch_9.mlir, change the chatglm.py line 170 with

iree-run-module \    
    --device=local-task \
    --module="/nodclouddata/chi/src/SHARK/chatglm.vmfb" \
    --function=forward \
    --input="1x4xi64=1"

The output is

EXEC @forward
result[0]: hal.buffer_view
4x32x32xf16=[[-NAN 0.142822 0.132812 0.201294 0.0273285 -0.0444946 0.0615845 0.0222778 0.187256 -0.0736694 0.178589 0.141602 0.267578 -0.179321 -0.114014 0.188354 0.59375 0.647949 -0.217773 -0.503906 -0.388672 0.281006 0.0260773 1.03125 -0.384766 -0.44751 -0.357178 0.78125 1.03125 -0.484375 -0.445557 0.597656][-NAN 0.0149841 -0.202393 -0.0646973 -0.0947876 0.0257111 0.152954 -0.312988 0.164185 -0.301025 0.102722 0.269531 0.29541 0.0250854 -0.601562 0.660156 0.949219 0.508301 0.030426 -1.25781 0.320068 0.363281 -0.443359 1.88281 0.0531006 -0.734375 0.106262 0.765625 0.984375 -0.757812 -0.679688 0.396484][-NAN -0.377441 -0.0886841 -0.143555 0.0489197 -0.0198364 0.0590515 0.121643 -0.0209961 -0.157227 0.27124 0.306641 0.174561 -0.405762 -0.613281 0.531738 1.05469 0.535156 0.131714 -1.2959 0.702637 0.287354 -0.252441 2.01562 -0.169434 -0.474609 -0.255859 0.972656 1.05469 -0.734375 -0.722656 0.566406][-NAN 0.406738 0.384521 0.460938 -0.2229 0.0623779 0.00845337 0.126343 -0.142822 -0.105225 -0.104553 0.192627 0.00684357 -0.427246 0.0197601 -0.00178814 0.480713 0.574219 -0.322266 -0.554688 -0.0859985 0.0394897 0.304932 1.54688 -0.246704 -0.396484 -0.257568 0.910156 0.914551 -0.65332 -0.535645 0.605957][-NAN -0.418457 -0.402832 -0.480713 0.0939941 -0.0409241 0.30249 -0.0563049 0.15918 -0.0656738 0.209595 0.157959 0.251953 0.0300598 -0.205811 0.325928 0.308594 0.347656 0.33374 -0.671875 0.43457 0.663086 -0.179443 0.941406 0.197021 -0.231445 -0.310303 0.406006 0.746094 -0.277344 -0.31665 0.263916][-NAN 0.108398 0.976562 0.992188 -0.707031 -0.459229 -0.108398 1.21875 0.566406 1.47656 0.648438 0.671875 0.929688 0.65625 -1.15625 0.376953 0.960938 -0.263672 1.04688 -1.27344 0.28125 0.0374451 -0.219727 1.57812 0.455078 -0.566406 0.194458 0.382812 0.761719 -0.28125 -0.71875 0.18457][-NAN 0.218872 0.198364 0.351562 -0.138916 0.194336 0.00869751 -0.0662231 0.174927 -0.143433 0.121643 0.145752 0.0852051 -0.3396 -0.230835 0.3125 0.400635 0.507812 -0.310547 -0.617188 0.190308 -0.125977 0.181763 1.53906 -0.417969 -0.363037 -0.118103 1 0.707031 -0.621094 -0.65625 0.442871][-NAN 0.242432 -0.172485 -0.0402527 -0.158447 -0.0847778 0.19043 0.000880241 0.300537 -0.0539551 0.269775 0.0354309 0.258057 -0.121643 -0.163818 0.256104 0.828613 0.535156 -0.0917358 -0.554688 0.0227509 0.523438 -0.273438 0.672363 0.153564 -0.202271 -0.429688 0.463379 0.737793 -0.60498 -0.177979 0.758301][-NAN 0.16687 -0.049469 0.0241852 -0.168091 -1.07812 0.953125 1.00781 0.984375 0.851562 1.375 1.01562 1.4375 1.32812 -1.49219 0.667969 1.23438 -0.408203 1.47656 -2.23438 1.17188 0.65625 -0.546875 1.6875 0.976562 -0.550781 1.15625 -0.126953 1 -0.0634766 -1.09375 0.0656738][-NAN 0.0722046 -0.0282593 -0.0578003 0.0189056 -0.0503235 -0.0721436 -0.245239 -0.0542908 -0.30835 0.0949707 0.0890503 0.503418 -0.294922 -0.0880737 -0.102173 0.546387 0.570312 -0.406494 -0.66748 -0.461182 -0.0996704 0.20105 1.55469 -0.293213 -0.543457 -0.464844 0.964844 0.753418 -0.691406 -0.726562 0.550781][-NAN 0.0623474 -0.0394592 -0.0491028 -0.109253 -0.00532532 -0.00837708 -0.0166473 0.0957642 -0.0270538 0.253906 0.443115 0.476562 0.251953 -0.66748 0.351562 1.03125 0.245117 0.32373 -0.913086 0.651855 0.339844 -0.0430298 1.13965 0.253662 -0.443115 -0.400391 0.715332 0.871094 -0.765625 -0.373047 0.976074][-NAN -0.107117 -0.0581665 -0.162354 -0.0953979 -0.192139 0.2771 -0.173218 0.613281 0.0386353 0.486084 0.209839 0.773438 -0.156982 -0.0804443 0.19519 0.91748 0.921875 -0.349854 -0.714355 -0.102722 -0.243286 0.0998535 2.09375 -0.378662 -0.156128 -0.394287 0.515625 0.354004 -0.265869 -0.277588 0.491211][-NAN -0.155396 -0.106873 -0.126221 0.0148773 0.0980835 -0.146851 -0.0118637 0.009552 0.078125 0.221802 -0.013031 -0.110718 -0.176758 -0.128174 0.153442 0.652344 0.566406 -0.490479 -0.496094 0.0133133 -0.204956 -0.100708 1.32031 -0.809082 -0.421631 -0.398682 1.11719 0.663574 -0.898438 -0.554199 0.617188][-NAN -0.137329 0.0624084 0.424072 0.0146179 -1.21875 0.0838623 1.03223 1.51562 1.05566 1.14844 0.859375 1.46094 0.404297 -1.5 0.289307 1.39062 0.367676 1.96875 -1.41406 0.628418 0.296875 -0.174927 1.4375 0.999512 -0.0878906 -0.0773315 0.169678 1.20312 -0.226929 -0.371094 0.211792][-NAN 0.209229 0.213989 0.291016 -0.0252075 0.0462341 -0.00183678 0.0205231 0.103271 -0.152344 0.106689 -0.00227547 0.0877686 -0.335449 0.100769 0.0362244 0.308838 0.496338 -0.285156 -0.449463 -0.365479 0.0715942 0.10907 1.07812 -0.0349426 -0.400146 -0.386719 0.898438 0.839844 -0.542969 -0.5 0.734375][-NAN 0.158203 0.113281 0.0916748 -0.314697 -0.23645 0.371338 0.273682 0.29126 0.233398 0.342041 0.120728 0.359375 0.135864 -0.281494 0.0345154 0.6875 0.695312 -0.478271 -0.777344 -0.150269 0.0703125 0.257812 1 -0.39624 -0.28125 -0.394531 0.808105 1.17188 -0.718262 -0.427734 0.629395][-NAN 0.0286102 0.0839233 -0.0317993 0.0648804 0.0410461 -0.119385 -0.0699463 -0.0432739 -0.0532837 -0.0927734 0.102051 -0.0171661 -0.0993652 -0.523438 0.0148392 -0.135376 -0.485107 -0.266357 -0.870117 -1.28906 1.11035 -1.53906 -1.54785 0.203857 -0.368896 -1.01562 1.50879 1.21777 1.34375 -0.326904 0.0969849][-NAN 0.0119705 0.15625 -0.078125 0.0838623 0.117615 0.0111084 0.0110474 0.114685 -0.0796509 -0.0977783 -0.0163574 -0.0653076 -0.126831 -0.0888672 0.347656 -0.0478821 0.196533 0.474609 0.0300446 0.259277 -0.0598145 -0.271484 -0.63623 0.551758 -1.09375 -0.999512 0.929688 0.520508 1.22559 0.395752 0.431396][-NAN 0.0478516 0.140869 -0.114929 0.174805 0.211914 -0.0474243 -0.017868 0.0183258 -0.271484 -0.0900269 -0.0342712 -0.476318 0.213745 0.401367 0.0577393 0.635742 1.01562 -0.858398 0.175293 0.157837 1.2793 0.0521545 -1.16309 0.342041 -0.451904 -0.809082 1.14746 0.90625 0.898438 0.476318 0.0497742][-NAN 0.0640259 0.0469971 -0.10199 0.17395 -0.0628052 0.169678 0.0995483 -0.0897217 -0.0575867 0.488525 -0.302246 -0.0686646 0.427246 0.589355 -0.454834 0.939941 1.02246 -1.1709 0.296387 -0.130249 1.66309 0.376465 -1.26367 0.287842 -0.722656 -0.776855 1.31934 1.125 0.648438 -0.0335388 -0.0169678][-NAN -0.00254059 0.0898438 -0.0976562 0.28125 -0.069519 -0.115906 0.214966 0.190796 0.26416 0.0657959 0.394043 -0.27124 -0.190796 -0.776855 -0.547852 0.22998 1.0625 -1.14062 0.730469 0.967773 0.613281 -0.394287 -1.19531 -0.0442505 -0.391602 -0.119019 1.125 -0.135498 1.59277 0.535645 0.895508][-NAN -0.0597229 -0.071167 0.0662231 -0.0159149 -0.133789 0.227051 0.224487 0.36499 0.00257683 -0.111328 -0.680176 -0.569824 1.18066 0.566895 -2.87305 -0.145508 1.56348 -2.5 -0.0403442 1.41406 2.31055 0.550293 -1.30469 -0.894531 -0.69873 -1.20215 1.96777 -0.270996 1.58594 1.05371 1.0625][-NAN 0.324951 0.400879 -0.443359 0.0141144 0.854004 -0.370117 -0.634766 -1.17969 -0.693359 -1.47168 -0.571777 0.0400085 1.77441 0.83252 0.213135 2.2207 1.91602 -1.74219 1.85938 -0.00273514 3.35742 0.19751 -1.52441 -1.39062 0.973145 0.28833 1.13281 -0.520508 0.650879 0.53125 0.70166][-NAN 0.447266 0.277344 -0.310303 -0.0130539 0.466797 -0.240845 -0.660156 -0.695312 -0.390625 -1.30469 -0.131836 0.071106 0.244385 -0.296875 0.757324 1.17871 0.486328 -0.820312 0.683105 -0.526367 1.99121 0.130249 -1.24121 -0.4375 0.0282288 -0.688477 1.16406 0.593262 0.910156 -0.0026207 0.203125][-NAN -0.0357056 0.0184784 -0.0187378 0.0130081 -0.0167542 -0.0133667 -0.0304565 0.0339355 0.0802612 0.0080719 -0.25415 -0.120972 0.373535 0.292236 -0.152222 0.428711 0.916992 -0.752441 0.453125 0.504395 0.974121 0.253906 -1.01367 0.0643921 -0.903809 -1.17188 1.40527 0.62207 1.11621 0.479248 0.315674][-NAN 0.0445557 0.259766 -0.143677 0.180664 0.206177 -0.279053 0.078125 0.129028 -0.233643 0.380859 0.255859 -1.125 -0.59668 1.29688 -0.35376 0.746582 1.75684 -1.10938 0.652344 0.0236053 2.10938 0.203125 -1.55469 -0.265869 -0.644531 -0.566895 0.973145 0.291016 0.996582 0.297119 0.574707][-NAN -0.297119 -0.030365 0.026535 0.0256805 0.0462036 -0.316162 -0.402588 0.194458 -0.162231 0.166016 0.32251 0.0821533 -0.155151 0.343994 -0.206421 0.151733 0.558594 -0.289062 -0.202026 -0.00947571 0.878418 -0.558105 -0.788574 0.519531 -0.186401 -0.182739 0.749512 0.341797 0.54248 0.0967407 -0.0722656][-NAN -0.041748 -0.00181484 -0.0288239 0.0518799 0.0551147 0.00549316 0.0603943 -0.00839996 0.123169 -0.226807 -0.19812 -0.0264587 0.171021 0.229248 -0.0961304 0.572754 1.03809 -0.81543 0.386963 0.70752 0.901367 0.213745 -1.0459 0.0230408 -0.837402 -0.825195 1.46875 0.770996 1.15625 0.296387 0.325928][-NAN 0.00265312 0.429688 0.269531 -0.367188 0.609375 -0.0617676 -1.53125 1.35938 -0.357422 -1.375 -0.996094 -0.314209 0.212036 1.23438 0.161865 1.64844 1.75 -1.49219 0.625 0.110413 2.1875 0.318359 -1.09375 -0.294922 0.171143 0.0635986 0.621094 0.199341 0.302734 0.134766 -0.0419617][-NAN -0.198364 -0.183716 0.275391 -0.0667114 -0.0384216 -0.116394 -0.30127 -0.742188 -0.457031 -0.045105 -0.217896 -0.213135 2.15625 0.220825 0.186646 2.90625 0.929688 -1.62402 1.60059 0.0589294 3.01367 0.414307 -1.74902 -0.890137 1.03809 -0.362793 0.901367 -0.253418 1.17188 0.325684 0.621094][-NAN -0.00503922 -0.0176392 -0.0256348 0.00799561 0.0133362 -0.0719604 -0.0431213 0.0621338 0.0771484 -0.192505 0.00991821 0.0211182 0.0186462 0.134644 -0.188232 0.0141068 -0.0140839 -0.167969 -0.714355 0.582031 -0.0226593 0.175415 -0.542969 0.526855 -0.784668 -0.745605 0.664062 1.34375 0.859375 0.0915527 0.0249634][-NAN 0.00349426 -0.0563354 0.0125809 0.0175171 -0.00355339 -0.0689697 0.0123138 0.0610046 -0.0119858 -0.09375 -0.040863 -0.0586548 0.124023 0.381348 -0.253906 0.0563354 0.566895 -0.797363 -0.787598 0.295898 0.540039 0.141357 -1.19629 0.1875 -1.1875 -0.949219 1.31348 1.43555 1.19531 -0.0949707 0.206665]][[...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...]][[...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...]][[...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...]]

In https://gist.github.com/manishghop/529225d5e7e609b679f53fc4272be05c

print("inputsid: ")
print(input_ids)  # tensor([[64790, 64792, 36474, 54591]]) torch.Size([1, 4]

@hanhanW Could you provide some guidance on what's going on in dispatch_9 and where to fix in IREE?

hanhanW commented 11 months ago

Can you help untangle the issue from SHARK? I think we need a simpler repro. The first step could be uploading the MLIR file to somewhere and attach a link to the issue.

The next step is that you can pass --iree-hal-dump-executable-sources-to=/tmp/dump to iree-compile. It will dump executables to the path, and please attach the dispatch_9.mlir to the issue. That will give us a minor repro which is codegen's input.

The input seems to be critical in this issue. So the next step is to generate inputs for the smaller repro. You can follow the tips to get the smaller reproducer.

Note that it will print many values to stderr during executaion if we pass --iree-flow-trace-dispatch-tensors to iree-compile. You will want to dump them to a text file. Then you can search NAN in the log and we will get a smaller repro.

Feel free to ping me if you run into any issues.

AmosLewis commented 11 months ago

@hanhanW Werid, I run the prebuild binary successfully this morning.

iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=/nodclouddata/chi/src/SHARK/nan/dispatch/2/tmp/core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true --iree-flow-break-dispatch=@forward:9 --iree-flow-trace-dispatch-tensors -o /tmp/chatglm9-dispatch-tensors.vmfb The terminal output: forwar9-dispatch-tensors.txt The chatglm9-dispatch-tensors.vmfb

iree-run-module \
    --device=local-task \
    --module="/tmp/chatglm9-dispatch-tensors.vmfb" \
    --function=forward \
    --input="1x4xi64=1"

The iree-run-module stop here:

=== forward_dispatch_4::forward_dispatch_4_generic_4x4608x64x64_f16 inputs ===
OUT_OF_RANGE; while invoking native function hal.buffer_view.trace; while calling import; 
[ 1]   native hal.buffer_view.trace:0 -
[ 0] bytecode module@0:4402 -; invoking function 'forward'; `sync func @forward(%input0: tensor<1x4xi64>) -> (%output0: tensor<1x4x65024xf16>)`
hanhanW commented 11 months ago

I am seeing the error at 6a60b64c69b832f2b8bfab32450f7136f3171509:

❯ build/tools/iree-opt ~/chatglm-6b-int4.mlir
/home/hanchung/chatglm-6b-int4.mlir:0:0: error: attempting to parse a byte at the end of the bytecode
/home/hanchung/chatglm-6b-int4.mlir:0:0: note: in bytecode version 6 produced by: MLIR18.0.0git

It looks like we need to regenerate the mlir file?

hanhanW commented 11 months ago

The terminal output: forwar9-dispatch-tensors.txt

Based on the log, I think we const-eval a NAN and it becomes an input. So the issue could be at the other dispatch.

=== jit_eval_0_dispatch_0::jit_eval_0_dispatch_0_generic_32_f16 inputs ===
32xf16=0 0.03125 0.0625 0.09375 0.125 0.15625 0.1875 0.21875 0.25 0.28125 0.3125 0.34375 0.375 0.40625 0.4375 0.46875 0.5 0.53125 0.5625 0.59375 0.625 0.65625 0.6875 0.71875 0.75 0.78125 0.8125 0.84375 0.875 0.90625 0.9375 0.96875

=== jit_eval_0_dispatch_0::jit_eval_0_dispatch_0_generic_32_f16 outputs ===
32xf16=-NAN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Are you able to get the dispatch? I think it will show up if you pass -mlir-print-ir-after=iree-flow-annotate-dispatches and -mlir-elide-elementsattrs-if-larger=4 to the iree-compile. Can you help extract the dispatch from the log?

AmosLewis commented 11 months ago

I am seeing the error at 6a60b64:

❯ build/tools/iree-opt ~/chatglm-6b-int4.mlir
/home/hanchung/chatglm-6b-int4.mlir:0:0: error: attempting to parse a byte at the end of the bytecode
/home/hanchung/chatglm-6b-int4.mlir:0:0: note: in bytecode version 6 produced by: MLIR18.0.0git

It looks like we need to regenerate the mlir file?

I try to rerun the chatglm.py with nothing change. It shows the same issue we came across yesterday. Could you download and run it? It should generate the mlir quickly compared to I run it >> download to local system >> upload to google bucket >> you download/upload it again to your vm.

(shark.venv) ➜  SHARK git:(main) ✗ python nan/chatglm.py
........
[DEBUG] Compiling torchscript graph
[DEBUG] Lowering Torch -> Linalg
[DEBUG] Successfully Generated mlir on device
[DEBUG] converting to bytecode
Saved falcon mlir at  chatglm-6b-int4.mlir
Compiling for device : cpu-task
Configuring for device:cpu-task
Target triple found:x86_64-linux-gnu
Traceback (most recent call last):
  File "/nodclouddata/chi/src/SHARK/nan/chatglm.py", line 170, in <module>
    path = shark_module.save_module(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nodclouddata/chi/src/SHARK/shark/shark_inference.py", line 213, in save_module
    return export_iree_module_to_vmfb(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nodclouddata/chi/src/SHARK/shark/iree_utils/compile_utils.py", line 554, in export_iree_module_to_vmfb
    flatbuffer_blob = compile_module_to_flatbuffer(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nodclouddata/chi/src/SHARK/shark/iree_utils/compile_utils.py", line 338, in compile_module_to_flatbuffer
    flatbuffer_blob = ireec.compile_file(
                      ^^^^^^^^^^^^^^^^^^^
  File "/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/core.py", line 257, in compile_file
    result = invoke_immediate(cl)
             ^^^^^^^^^^^^^^^^^^^^
  File "/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/binaries.py", line 200, in invoke_immediate
    raise CompilerToolError(process)
iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile
Error code: -11
Diagnostics:
Please report issues to https://github.com/openxla/iree/issues and include the crash backtrace.
Stack dump:
0.  Program arguments: /nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-embedded-linker-path=/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true
 #0 0x00007f7dd755fc27 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x50f3c27)
 #1 0x00007f7dd755d96e llvm::sys::RunSignalHandlers() (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x50f196e)
 #2 0x00007f7dd75602ef SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f7dd245d420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #4 0x00007f7dd85e8531 mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::reassociateDequantMatmul(mlir::RewriterBase&, mlir::linalg::GenericOp, mlir::linalg::GenericOp, int) FuseDequantizationMatmul.cpp:0:0
 #5 0x00007f7dd85e4cb6 mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::FuseDequantizationMatmulPass::runOnOperation() FuseDequantizationMatmul.cpp:0:0
 #6 0x00007f7dd76e8cf9 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x527ccf9)
 #7 0x00007f7dd76e96d8 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x527d6d8)
 #8 0x00007f7dd76eb456 mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool) (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x527f456)
 #9 0x00007f7dd76e8eec mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x527ceec)
#10 0x00007f7dd76ec7ea mlir::PassManager::run(mlir::Operation*) (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x52807ea)
#11 0x00007f7dd74b8ee9 ireeCompilerInvocationPipeline (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/_mlir_libs/libIREECompiler.so+0x504cee9)
#12 0x00007f7dd76b12da mlir::iree_compiler::runIreecMain(int, char**)::$_2::operator()(iree_compiler_source_t*) const iree_compile_lib.cc:0:0
#13 0x00007f7dd76b0b97 mlir::iree_compiler::runIreecMain(int, char**) iree_compile_lib.cc:0:0
#14 0x00007f7dd227b083 __libc_start_main /build/glibc-BHL3KM/glibc-2.31/csu/../csu/libc-start.c:342:3
#15 0x000000000020177e _start (/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile+0x20177e)

Invoked with:
 iree-compile /nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-embedded-linker-path=/nodclouddata/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true

Need more information? Set IREE_SAVE_TEMPS=/some/dir in your environment to save all artifacts and reproducers.
AmosLewis commented 11 months ago

I am seeing the error at 6a60b64:

❯ build/tools/iree-opt ~/chatglm-6b-int4.mlir
/home/hanchung/chatglm-6b-int4.mlir:0:0: error: attempting to parse a byte at the end of the bytecode
/home/hanchung/chatglm-6b-int4.mlir:0:0: note: in bytecode version 6 produced by: MLIR18.0.0git

It looks like we need to regenerate the mlir file?

I just look the chatglm.py code, the mlir is directly generated and saved by torch_mlir.compile. It shouldn't change in different run.

AmosLewis commented 11 months ago

The terminal output: forwar9-dispatch-tensors.txt

Based on the log, I think we const-eval a NAN and it becomes an input. So the issue could be at the other dispatch.

=== jit_eval_0_dispatch_0::jit_eval_0_dispatch_0_generic_32_f16 inputs ===
32xf16=0 0.03125 0.0625 0.09375 0.125 0.15625 0.1875 0.21875 0.25 0.28125 0.3125 0.34375 0.375 0.40625 0.4375 0.46875 0.5 0.53125 0.5625 0.59375 0.625 0.65625 0.6875 0.71875 0.75 0.78125 0.8125 0.84375 0.875 0.90625 0.9375 0.96875

=== jit_eval_0_dispatch_0::jit_eval_0_dispatch_0_generic_32_f16 outputs ===
32xf16=-NAN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Are you able to get the dispatch? I think it will show up if you pass -mlir-print-ir-after=iree-flow-annotate-dispatches and -mlir-elide-elementsattrs-if-larger=4 to the iree-compile. Can you help extract the dispatch from the log?

Here you go. I also give the cmd I run chatglm_dispatch.mlir

iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=/nodclouddata/chi/src/SHARK/nan/dispatch/2/tmp/core-reproducer.mlir --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true --iree-flow-break-dispatch=@forward:9 --iree-flow-trace-dispatch-tensors -mlir-print-ir-after=iree-flow-annotate-dispatches -mlir-elide-elementsattrs-if-larger=4 -o /tmp/chatglm9.vmfb

Debug steps with this info:

  1. Manually search jit_eval_0_dispatch_0_generic_32_f16 in chatglm_dispatch.mlir to locate the bug code.

  2. Manually create a mlir with the bug code.

builtin.module {
      func.func @jit_eval_0_dispatch_0_generic_32_f16(%arg0: !flow.dispatch.tensor<readonly:tensor<32xf16>> loc("aten::reciprocal"("<eval_with_key>.5":11:17)), %arg1: !flow.dispatch.tensor<writeonly:tensor<32xf16>> loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))) {
        %cst = arith.constant 1.000000e+04 : f16 loc(callsite("aten::pow"("<eval_with_key>.5":10:12) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
        %cst_0 = arith.constant 0.000000e+00 : f16 loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
        %cst_1 = arith.constant 1.000000e+00 : f16 loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
        %0 = flow.dispatch.tensor.load %arg0, offsets = [0], sizes = [32], strides = [1] : !flow.dispatch.tensor<readonly:tensor<32xf16>> -> tensor<32xf16> loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
        %1 = tensor.empty() : tensor<32xf16> loc(callsite("aten::arange"("<eval_with_key>.5":8:13) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
        %2 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%0 : tensor<32xf16>) outs(%1 : tensor<32xf16>) {
        ^bb0(%in: f16 loc("aten::div"("<eval_with_key>.5":9:10)), %out: f16 loc("aten::reciprocal"("<eval_with_key>.5":11:17))):
          %3 = math.powf %cst, %in : f16 loc(callsite("aten::pow"("<eval_with_key>.5":10:12) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
          %4 = arith.cmpf one, %3, %cst_0 : f16 loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
          cf.assert %4, "unimplemented: tensor with zero element" loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
          %5 = arith.divf %cst_1, %3 : f16 loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
          linalg.yield %5 : f16 loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
        } -> tensor<32xf16> loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
        flow.dispatch.tensor.store %2, %arg1, offsets = [0], sizes = [32], strides = [1] : tensor<32xf16> -> !flow.dispatch.tensor<writeonly:tensor<32xf16>> loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
        return loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
      } loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
    } loc(callsite("aten::reciprocal"("<eval_with_key>.5":11:17) at "aten::reciprocal"("<eval_with_key>.5":11:17)))
  1. Useiree-opt to delete the loc info
  2. Manually delete the flow dialect to get the z2.mlir in the following hanhanW's comment.
hanhanW commented 11 months ago

Thank you! I can reproduce the issue starting with the dispatch.

#map = affine_map<(d0) -> (d0)>
func.func @main(%0: tensor<32xf16>) -> tensor<32xf16>{
  %cst = arith.constant 1.000000e+04 : f16
  %cst_0 = arith.constant 0.000000e+00 : f16
  %cst_1 = arith.constant 1.000000e+00 : f16
  %1 = tensor.empty() : tensor<32xf16>
  %2 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel"]} ins(%0 : tensor<32xf16>) outs(%1 : tensor<32xf16>) {
  ^bb0(%in: f16, %out: f16):
    %3 = math.powf %cst, %in : f16
    %4 = arith.cmpf one, %3, %cst_0 : f16
    cf.assert %4, "unimplemented: tensor with zero element"
    %5 = arith.divf %cst_1, %3 : f16
    linalg.yield %5 : f16
  } -> tensor<32xf16>
  return %2 : tensor<32xf16>
}

Compile to vmfb: iree-compile --output-format=vm-bytecode --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-cpu=cascadelake --iree-llvmcpu-target-triple=x86_64-unknown-linux-gnu ~/z2.mlir -o /tmp/a.vmfb --iree-llvmcpu-enable-ukernels=all

Run the module: iree-run-module --device=local-sync --module=/tmp/a.vmfb --function=main --input=32xf16="0 0.03125 0.0625 0.09375 0.125 0.15625 0.1875 0.21875 0.25 0.28125 0.3125 0.34375 0.375 0.40625 0.4375 0.46875 0.5 0.53125 0.5625 0.59375 0.625 0.65625 0.6875 0.71875 0.75 0.78125 0.8125 0.84375 0.875 0.90625 0.9375 0.96875"

Then I got the output:

EXEC @main
result[0]: hal.buffer_view
32xf16=-NAN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

I am taking a look at the dispatch.

hanhanW commented 11 months ago

I think there is a bug in PolynomialApproximation pass. We have wrong approximation for math.powf op.

https://github.com/openxla/iree/blob/5889a12cf737abed894f9ef095c52265259b9d90/compiler/src/iree/compiler/Codegen/LLVMCPU/Passes.cpp#L657-L658

I strip the dispatch to make it only have a single powf op, e.g.,

#map = affine_map<(d0) -> (d0)>
module {
  func.func @main(%arg0: tensor<32xf16>) -> tensor<32xf16> {
    %cst = arith.constant 1.000000e+04 : f16
    %cst_0 = arith.constant 0.000000e+00 : f16
    %cst_1 = arith.constant 1.000000e+00 : f16
    %0 = tensor.empty() : tensor<32xf16>
    %1 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel"]} ins(%arg0 : tensor<32xf16>) outs(%0 : tensor<32xf16>) {
    ^bb0(%in: f16, %out: f16):
      %2 = math.powf %cst, %in : f16
      linalg.yield %2 : f16
    } -> tensor<32xf16>
    return %1 : tensor<32xf16>
  }
}

running with the input returns NAN and INF.

❯ build/tools/iree-run-module --device=local-sync --module=/tmp/a.vmfb --function=main --input=32xf16="0 0.03125 0.0625 0.09375 0.125 0.15625 0.1875 0.21875 0.25 0.28125 0.3125 0.34375 0.375 0.40625 0.4375 0.46875 0.5 0.53125 0.5625 0.59375 0.625 0.65625 0.6875 0.71875 0.75 0.78125 0.8125 0.84375 0.875 0.90625 0.9375 0.96875"
EXEC @main
result[0]: hal.buffer_view
32xf16=-NAN INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF INF

If I comment out the pass, we can get reasonable outputs:

❯ build/tools/iree-run-module --device=local-sync --module=/tmp/a.vmfb --function=main --input=32xf16="0 0.03125 0.0625 0.09375 0.125 0.15625 0.1875 0.21875 0.25 0.28125 0.3125 0.34375 0.375 0.40625 0.4375 0.46875 0.5 0.53125 0.5625 0.59375 0.625 0.65625 0.6875 0.71875 0.75 0.78125 0.8125 0.84375 0.875 0.90625 0.9375 0.96875"
EXEC @main
result[0]: hal.buffer_view
32xf16=1 1.33398 1.77832 2.37109 3.16211 4.21875 5.625 7.5 10 13.3359 17.7812 23.7188 31.625 42.1562 56.2188 75 100 133.375 177.875 237.125 316.25 421.75 562.5 750 1000 1334 1778 2372 3162 4216 5624 7500

The implementation is at https://github.com/llvm/llvm-project/blob/2a9d8caf29ca2b2cf4758db31c64fd20cb5eb3bf/mlir/lib/Dialect/Math/Transforms/ExpandPatterns.cpp#L165-L192

@bviyer @rsuderman can you help review if the approximation is correct?

hanhanW commented 11 months ago

I have a workaround for the issue: https://github.com/openxla/iree/pull/15927

We can remove the workaround after fixing the polynomial approximation issue.

AmosLewis commented 11 months ago

I have a workaround for the issue: #15927

We can remove the workaround after fixing the polynomial approximation issue.

PYTHON TEST FAIL. Detail is here chatglm_fail_1214.txt

hanhanW commented 11 months ago
#10 0x00007f7096798990 mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::QuantizedMatmulRewriter::precondition() /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:330:61
#11 0x00007f70967982de mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::reassociateDequantMatmul(mlir::RewriterBase&, mlir::linalg::GenericOp, mlir::linalg::GenericOp, int) /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:767:18

I think you are running into new issues. The mlir file is regenerated, and we can not compile it using IREE main branch. It crashes in FuseDequantizationMatmul.cpp, @Max191 can you coordinate with @AmosLewis on the crash?

Max191 commented 10 months ago
#10 0x00007f7096798990 mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::QuantizedMatmulRewriter::precondition() /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:330:61
#11 0x00007f70967982de mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::reassociateDequantMatmul(mlir::RewriterBase&, mlir::linalg::GenericOp, mlir::linalg::GenericOp, int) /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:767:18

I think you are running into new issues. The mlir file is regenerated, and we can not compile it using IREE main branch. It crashes in FuseDequantizationMatmul.cpp, @Max191 can you coordinate with @AmosLewis on the crash?

Downloading the model now. I'll try to repro once it's downloaded. Is there a specific iree-compile command I should try? Otherwise I'll just use whatever chatglm.py is doing.

AmosLewis commented 10 months ago
#10 0x00007f7096798990 mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::QuantizedMatmulRewriter::precondition() /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:330:61
#11 0x00007f70967982de mlir::iree_compiler::GlobalOptimization::(anonymous namespace)::reassociateDequantMatmul(mlir::RewriterBase&, mlir::linalg::GenericOp, mlir::linalg::GenericOp, int) /nodclouddata/chi/src/iree/compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp:767:18

I think you are running into new issues. The mlir file is regenerated, and we can not compile it using IREE main branch. It crashes in FuseDequantizationMatmul.cpp, @Max191 can you coordinate with @AmosLewis on the crash?

Downloading the model now. I'll try to repro once it's downloaded. Is there a specific iree-compile command I should try? Otherwise I'll just use whatever chatglm.py is doing.

chatglm.py should be enough. It would be better to use chatglm.py to repeat the error locally. It will download the model from huggingface and use torch_mlir.compile to generate and save the mlir model in chatglm-6b-int4.mlir. Then use shark_module.save_module to run the iree-compile. If you look at the chatglm_fail_log_1214.txt line 611 there is an equivalent iree-compile cmd there you can use: iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-embedded-linker-path=/nodclouddata/chi/src/iree-build/compiler/bindings/python/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs --iree-opt-strip-assertions=false --verify=true

Max191 commented 10 months ago

@AmosLewis I am getting this same error even when generating with chatglm.py:

iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile
Error code: 1
Diagnostics:
chatglm-6b-int4.mlir:0:0: error: attempting to parse a byte at the end of the bytecode
chatglm-6b-int4.mlir:0:0: note: in bytecode version 6 produced by: MLIR18.0.0git

Is there something I need to do other than running the script with ToM SHARK?

Max191 commented 10 months ago

@AmosLewis Can you try generating with a fresh venv on ToM shark if you haven't already? We aren't able to reproduce the error you're hitting, and I want to make sure we have the same environment and versions for everything.

AmosLewis commented 10 months ago

@AmosLewis I am getting this same error even when generating with chatglm.py:

iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile
Error code: 1
Diagnostics:
chatglm-6b-int4.mlir:0:0: error: attempting to parse a byte at the end of the bytecode
chatglm-6b-int4.mlir:0:0: note: in bytecode version 6 produced by: MLIR18.0.0git

Is there something I need to do other than running the script with ToM SHARK?

I have seen this error. You can pip uninstall the iree-compiler and iree-runtimes. Then setup the pythonpath to yout local iree_build/ export PYTHONPATH=/nodclouddata/chi/src/iree-build/compiler/bindings/python:/nodclouddata/chi/src/iree-build/runtime/bindings/python::$PYTHONPATH The iree commit is on bc0b7d42bbd04b4af0a86eb56556ad8fcc6985a2. This is to make sure HANHAN's fix of math.power is enabled. I have listed this info in the comments of chatglm_fail_1214.txt

AmosLewis commented 10 months ago

@AmosLewis Can you try generating with a fresh venv on ToM shark if you haven't already? We aren't able to reproduce the error you're hitting, and I want to make sure we have the same environment and versions for everything.

I have listed the venv and iree version info in the comments of chatglm_fail_1214.txt

Max191 commented 10 months ago

@AmosLewis Thanks for pointing me to that info! I was able to reproduce and fix the issue on my side. The quantized matmul reassociation wasn't meant to support f16, but was not failing gracefully. I went ahead and added f16 support with https://github.com/openxla/iree/pull/15964, and I was able to compile the model. Let me know if you still have any issues after picking this.

AmosLewis commented 10 months ago

@AmosLewis Thanks for pointing me to that info! I was able to reproduce and fix the issue on my side. The quantized matmul reassociation wasn't meant to support f16, but was not failing gracefully. I went ahead and added f16 support with #15964, and I was able to compile the model. Let me know if you still have any issues after picking this.

Thanks. I will try your patch on my side. Could you also run the vmfb buy this run_chatglm.py on you side? It tries to run the chatglm-9.vmfb generated by chatglm.py

AmosLewis commented 10 months ago

With all the previous fix(https://github.com/openxla/iree/pull/15927 and https://github.com/openxla/iree/pull/15964), the compile error fix but the NAN issue still exist.

(shark.venv) ➜  SHARK git:(main) ✗ python nan/run_chatglm.py
tensor([[64790, 64792, 36474, 54591]]) torch.Size([1, 4])
/nodclouddata/chi/src/SHARK/nan/run_chatglm.py:13: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  input_ids = torch.tensor(input_ids).reshape([1, input_id_len])
Loading module /nodclouddata/chi/src/SHARK/chatglm.vmfb...
::: Detailed report (took longer than 2.5s):
  +0.8661746978759766ms: get_iree_runtime_config
  +20850.444555282593ms: mmap /nodclouddata/chi/src/SHARK/chatglm.vmfb
  +20850.829124450684ms: ireert.SystemContext created
  +20853.740215301514ms: module initialized
Successfully Loaded vmfb model
inputsid: 
tensor([[64790, 64792, 36474, 54591]])
output:
[[[nan nan nan ... nan nan nan]
  [nan nan nan ... nan nan nan]
  [nan nan nan ... nan nan nan]
  [nan nan nan ... nan nan nan]]]
hanhanW commented 10 months ago

Can you triage the issue like what we've done above, and attach the reproducer like https://github.com/openxla/iree/issues/15661#issuecomment-1854720527?

AmosLewis commented 10 months ago

Can you triage the issue like what we've done above, and attach the reproducer like #15661 (comment)?

Here is what I got chatglm_fail_log_dispatch9_1218_with_max_15964.txt. It still break at the dispatch9 but stuck here for about 40mins at INF this time. I append the repeat step in the comments as well.

hanhanW commented 10 months ago
=== jit_eval_8_dispatch_0::jit_eval_8_dispatch_0_generic_4x4_f32xf16xf16 inputs ===
f32=-INF
f16=0

=== jit_eval_8_dispatch_0::jit_eval_8_dispatch_0_generic_4x4_f32xf16xf16 outputs ===
4x4xf16=[0 -INF -INF -INF][0 0 -INF -INF][0 0 0 -INF][0 0 0 0]

It looks like othe dispatches generate -INF and pass it to jit_eval_8_dispatch_0. We should look above log to see where the first NAN/INF is generated. Here is a tip that I can think:

grep -B 5 --max-count=1 -n NAN /path-to-log
grep -B 5 --max-count=1 -n INF /path-to-log

This should navigate you to the first place that generates NAN/INF.

AmosLewis commented 10 months ago
=== jit_eval_8_dispatch_0::jit_eval_8_dispatch_0_generic_4x4_f32xf16xf16 inputs ===
f32=-INF
f16=0

=== jit_eval_8_dispatch_0::jit_eval_8_dispatch_0_generic_4x4_f32xf16xf16 outputs ===
4x4xf16=[0 -INF -INF -INF][0 0 -INF -INF][0 0 0 -INF][0 0 0 0]

It looks like othe dispatches generate -INF and pass it to jit_eval_8_dispatch_0. We should look above log to see where the first NAN/INF is generated. Here is a tip that I can think:

grep -B 5 --max-count=1 -n NAN /path-to-log
grep -B 5 --max-count=1 -n INF /path-to-log

This should navigate you to the first place that generates NAN/INF.

(shark.venv) ➜  tmp git:(main) ✗ grep -B 5 --max-count=1 -n NAN ./1218_chatglm_forward9-dispatch-tensors.txt 
(shark.venv) ➜  tmp git:(main) ✗ grep -B 5 --max-count=1 -n INF ./1218_chatglm_forward9-dispatch-tensors.txt
53-
54-=== jit_eval_6_dispatch_0::jit_eval_6_dispatch_0_transpose outputs ===
55-f16=0
56-
57-=== jit_eval_8_dispatch_0::jit_eval_8_dispatch_0_generic_4x4_f32xf16xf16 inputs ===
58:f32=-INF

I didn't find the any dispatch output INF to 8. I also try to print the annotation here 1218_chatglm_forward9-dispatch-tensors-annotation.mlir. Then search the jit_eval_8_dispatch_0_generic_4x4_f32xf16xf16.

hanhanW commented 10 months ago

I know what's happening... This is happening in const-eval stage, so all the inputs for these dispatches are constant data. It means that the frontend generates invalid constants or IREE reads the weights incorrectly. There are two things in my mind:

  1. Do we add f64->f32 demotion in the frontend?
  2. Can you check if the frontend generates valid weight?
hanhanW commented 10 months ago
  1. Do we add f64->f32 demotion in the frontend?

If the weight is in f64 type, and we can't represent it using f32 type, it could become INF or -INF.

  1. Can you check if the frontend generates valid weight?

If the original weight is invalid, the bug is in the model itself.

AmosLewis commented 10 months ago
  1. Do we add f64->f32 demotion in the frontend?

If the weight was in f64 type, and we can represent it using f32 type, it could become INF or -INF.

  1. Can you check if the frontend generates valid weight?

If the original weight is invalid, the bug is in the model itself.

I just elide the input chatglm-6b-int4.mlir by torch-mlir-opt --mlir-elide-elementsattrs-if-larger=4 chatglm-6b-int4.mlir > chatglm-6b-int4-elide.mlir and search. Here is the https://storage.googleapis.com/shark-public/chi/iree/chatglm/9/1218/chatglm-6b-int4-elide.mlir. If we search f64 to f32, there are 57 results to do the demotion. There are tons of f64 to f16 as well. It look like:

%cst_427 = arith.constant 1.000000e-05 : f64
...    
%28 = linalg.generic {indexing_maps = [#map11, #map1], iterator_types = ["parallel", "parallel", "parallel"]} ins(%27 : tensor<4x1x1xf32>) outs(%24 : tensor<4x1x1xf32>) {
    ^bb0(%in: f32, %out: f32):
      %2160 = arith.truncf %cst_427 : f64 to f32
      %2161 = arith.addf %in, %2160 : f32
      linalg.yield %2161 : f32
    } -> tensor<4x1x1xf32>
%cst_428 = arith.constant 0.29730177875068026 : f64
...
%78 = linalg.generic {indexing_maps = [#map24, #map8], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%75 : tensor<1x32x4x128xf16>) outs(%74 : tensor<1x32x4x128xf16>) {
    ^bb0(%in: f16, %out: f16):
      %2160 = arith.truncf %cst_428 : f64 to f16
      %2161 = arith.mulf %in, %2160 : f16
      linalg.yield %2161 : f16
    } -> tensor<1x32x4x128xf16>
hanhanW commented 10 months ago

I have to go. One other thing we can try is adding iree-llvmcpu-use-fast-min-max-ops flag to iree-compile. I don't know what the inputs are, but maybe they were always NaN/INF and now they are propagated as such.

(we should also rename the flag -- I will take a look tomorrow)

harrisonGPU commented 10 months ago

I know what's happening... This is happening in const-eval stage, so all the inputs for these dispatches are constant data. It means that the frontend generates invalid constants or IREE reads the weights incorrectly. There are two things in my mind:

  1. Do we add f64->f32 demotion in the frontend?
  2. Can you check if the frontend generates valid weight?

Hello, @hanhanW Could you please tell me where the IREE frontend generates invalid constants, or where IREE reads the weights? I'm a new student and I'm eager to learn about the IREE open-source project by investigating this issue.Thank you for your help! I truly appreciate it as I start exploring the IREE project.

hanhanW commented 10 months ago

update:

We can run the model without NaN on cascadelake in a clean build. Perhaps it can only be reproduced on haswell CPU. I'm setting up an env on @AmosLewis VM, and see if I can reproduce the issue.

hanhanW commented 10 months ago

I am able to produce reasonable output even on the same VM, if you don't use --iree-global-opt-enable-quantized-matmul-reassociation. IMO, the flag is off by default, which means that it is a development flag. That path is not fully tested.

My experiments show that it is the root cause about NaN. It produces NaN only if I added the flag. I don't know why it is added, but can we exclude the flag for now?

AmosLewis commented 10 months ago

I am able to produce reasonable output even on the same VM, if you don't use --iree-global-opt-enable-quantized-matmul-reassociation. IMO, the flag is off by default, which means that it is a development flag. That path is not fully tested.

My experiments show that it is the root cause about NaN. It produces NaN only if I added the flag. I don't know why it is added, but can we exclude the flag for now?

It looks like we are adding it here in shark https://github.com/nod-ai/SHARK/blob/788cc9157c942a4c6f73e3a85f16b14c9ce4d4d5/shark/iree_utils/compile_utils.py#L46. @dan-garvey @monorimet Can help disable it in shark.

Max191 commented 10 months ago

I am able to produce reasonable output even on the same VM, if you don't use --iree-global-opt-enable-quantized-matmul-reassociation. IMO, the flag is off by default, which means that it is a development flag. That path is not fully tested. My experiments show that it is the root cause about NaN. It produces NaN only if I added the flag. I don't know why it is added, but can we exclude the flag for now?

It looks like we are adding it here in shark https://github.com/nod-ai/SHARK/blob/788cc9157c942a4c6f73e3a85f16b14c9ce4d4d5/shark/iree_utils/compile_utils.py#L46. @dan-garvey @monorimet Can help disable it in shark.

Yeah, we don't want to be adding this flag for anything other than llama2 on CPU. It is needed for llama2 performance, but it is still experimental.

AmosLewis commented 10 months ago

Use shark with this commit https://github.com/nod-ai/SHARK/pull/2047 the NAN issuue should be fix this issue. Could you try @manishghop?

(shark.venv) ➜  nan git:(main) ✗ python run_chatglm.py
/home/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
tensor([[64790, 64792, 36474, 54591]]) torch.Size([1, 4])
/home/chi/src/SHARK/nan/run_chatglm.py:13: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  input_ids = torch.tensor(input_ids).reshape([1, input_id_len])
Loading module chatglm.vmfb...
Successfully Loaded vmfb model
::: Detailed report (took longer than 5.0s):
  +0.3094673156738281ms: Load to device: torch.Size([1, 4])
  +0.5853176116943359ms: Invoke function: forward
  +6925.025939941406ms: Invoke complete
  +6925.110816955566ms: Result to host
[[[-10.83   -10.83     0.533  ... -10.84   -10.83   -10.84  ]
  [-12.5    -12.52     2.217  ... -12.54   -12.53   -12.51  ]
  [ -9.59    -9.59    -0.3699 ...  -9.62    -9.62    -9.61  ]
  [ -9.586   -9.58     1.07   ...  -9.56    -9.58    -9.57  ]]]
AmosLewis commented 10 months ago

related issue https://github.com/openxla/iree/issues/16068