Error while conversion of .mlir to .vmfb

manishghop commented 9 months ago

What happened?

Model: https://huggingface.co/THUDM/chatglm2-6b

Issue: Error while using padding to the input tokens.

As in the inferencing, the output_token is appended to the input_ids in the next forward pass, the length of the input_ids in each subsequent forward pass increases by 1. To fix it, we tried to use padding but with several combinations of max_length (1,5,10,15,20). The execution flow breaks while converting .mlir to .vmfb.

The code to compile pytorch model to .mlir & .vmfb: export_chatglm2.py

This what we are trying to replicate(This is without using shark) : chatglm.py Using this as a reference, we are trying to do replicate this using nod ai shark.

This is what we aim to do using shark:run_chatglm.py. Ideally we want the vmfb model account for different sized prompts and not be bounded to always have a fixed shape, which currently is not possible.

Reason: During the inferencing, while generating new tokens we append the predicted tokens in the first forward pass to the input_ids for the subsequent forward passes. Initial code didn’t accounted for dynamic changes to the shape of input_ids, it expects the shape for input_ids which was passed while torch_mlir compilation. For ex: for “What is the capital of Canada?” -> shape: (1,9). While inferencing, it expects the shape for input_ids to always be (1,9) but in the second forward pass onwards the shape of input_ids will increase by 1 till we reach the stopping criteria, hence it pops an error.

Using python API, the process remains struck but while using cmd line args: $iree-compile ./chatglm2-6b-int4.mlir -o ./chatglm.vmfb The error we get is :

Steps to reproduce your issue

1) Run (export_chatglm2.py) to compile pytorch model to .mlir & .vmfb.

What component(s) does this issue relate to?

Compiler, Runtime

AmosLewis commented 9 months ago

hanhanW commented 8 months ago

Using python API, the process remains struck but while using cmd line args: $iree-compile ./chatglm2-6b-int4.mlir -o ./chatglm.vmfb

Can you attach the MLIR file to the issue? We need to decouple such issue from upstream projects as much as possible. I'm not able to look at this if they are at framework level.

hanhanW commented 8 months ago

The errors are all about stream dialect, so it does not seem like a codegen issue. We need some help for triaging this. @ScottTodd could you or others take a look at this?

benvanik commented 8 months ago

I don't see any errors, just the module IR after dumping. We need the full console output (and reproducers).

MaheshRavishankar commented 8 months ago

@manishghop couple of things that can help... 1) You seem to have the .mlir file but that is working fine from command line? If you can reproduce from command line, can you include the .mlir file and the command line you need to invoke and reproduce the issue 2) I am not familiar with the Python flow as much, if it is really a difference of invoking from Python vs command line, then that will provide a clue of this being something else

manishghop commented 8 months ago

1) Sharing .mlir file is not possible right now as the linux system I was working in seems to be under OS installation upgrade and is inaccessible. 2) I do seem to have .mlir file. For the conversion of .mlir to .vmfb when I tried to use python api it didn't seem to progress further so I had to kill the process, then I tried to use iree-compiler command to check the output, it gave a big error in the terminal of which I shared only till I could scroll. This meant to me that something is off at the bytecode conversion.

MaheshRavishankar commented 8 months ago

I do seem to have .mlir file. For the conversion of .mlir to .vmfb when I tried to use python api it didn't seem to progress further so I had to kill the process, then I tried to use iree-compiler command to check the output, it gave a big error in the terminal of which I shared only till I could scroll. This meant to me that something is off at the bytecode conversion.

The conversion from .mlir to .vmfb is a compiler and not just a conversion. If you share the a) MLIR file b) the compile command c) the log of the failure (you can just redirect the output to a file and share that) we can help more.

AmosLewis commented 8 months ago

@manishghop @ScottTodd With all the before-iree error I fixed in https://github.com/llvm/torch-mlir/issues/2730 added, I just run the export_chatglm2.py by python export_chatglm2.py. It takes a while, but in the end it saved the mlir and vmfb successfully. Here is the cmd output I just got

[DEBUG] Compiling torchscript graph
[DEBUG] Lowering Torch -> Linalg
[DEBUG] Successfully Generated mlir on device
[DEBUG] converting to bytecode
Saved falcon mlir at  chatglm-6b-int4.mlir
Compiling for device : cpu-task
Configuring for device:cpu-task
Target triple found:x86_64-linux-gnu
Saved vmfb in ./chatglm.vmfb.
Saved vic vmfb at  ./chatglm.vmfb

You mentioned Using python API, the process remains struck, but I guess you just need to wait a little longer or use powerful machine? And the cmd iree-compile ./chatglm2-6b-int4.mlir -o ./chatglm.vmfb you use, I think you probably need more flags to repeat the same functionality as the python code shark_module.save_module did. It should be something looks like this: iree-compile chatglm-6b-int4.mlir --iree-input-type=tm_tensor --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-target-triple=x86_64-linux-gnu --iree-llvmcpu-enable-ukernels --iree-llvmcpu-stack-allocation-limit=256000 --iree-global-opt-enable-quantized-matmul-reassociation --iree-stream-resource-max-allocation-size=4294967295 --iree-vm-bytecode-module-strip-source-map=true --iree-util-zero-fill-elided-attrs -o /tmp/chatglm.vmfb

The only error I saw is when I run the run_chatglm.py with the generated chatglm.vmfb

(shark.venv) ➜  chatglm python run_chatglm.py
/home/chi/src/SHARK/shark.venv/lib/python3.11/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
Loading module chatglm.vmfb...
[DEBUG] setting iree runtime flags for cpu:
--task_topology_max_group_count=30
--task_topology_max_group_count=30
[DEBUG] setting iree runtime flags for cpu:
--task_topology_max_group_count=30
Successfully Loaded vmfb model
Traceback (most recent call last):
  File "/home/chi/src/test/chatglm/run_chatglm.py", line 109, in <module>
    first_output = shark_module.forward(inputs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chi/src/SHARK/shark/shark_inference.py", line 159, in forward
    return self.shark_runner.run(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chi/src/SHARK/shark/shark_runner.py", line 115, in run
    return get_results(
           ^^^^^^^^^^^^
  File "/home/chi/src/SHARK/shark/iree_utils/compile_utils.py", line 651, in get_results
    result = compiled_vm[function_name](*device_inputs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chi/src/iree-build/runtime/bindings/python/iree/runtime/function.py", line 137, in __call__
    self._invoke(arg_list, ret_list)
  File "/home/chi/src/iree-build/runtime/bindings/python/iree/runtime/function.py", line 162, in _invoke
    self._vm_context.invoke(self._vm_function, arg_list, ret_list)
ValueError: Error invoking function: c/runtime/src/iree/modules/hal/utils/buffer_diagnostics.c:225: INVALID_ARGUMENT; input0 shape dimension 1 mismatch; expected 20 but have 9; expected shape `1x20`, actual shape `1x9`; while invoking native function hal.buffer_view.assert; while calling import; 
[ 1]   native hal.buffer_view.assert:0 -
[ 0] bytecode module@0:3570 -

Then I try to run it with different input size, everything looks good.

iree-run-module \
    --device=local-task \
    --module="chatglm.vmfb" \
    --function=forward \
    --input="1x20xi64=1"
EXEC @forward
result[0]: hal.buffer_view
1x20x65024xf16=[[-4.12891 -4.14062 7.33984 -3.08398 -4.09766 -4.10547 -4.11328 -4.125 -4.10547 -4.11328 -4.12109 -4.12891 -0.742188 11.8906 -4.10547 -4.13672 -4.13672 -4.11328 -4.08203 -4.10547 -4.14062 -4.16016 -4.12891 -4.13281 -4.125 -4.10547 -4.10547 -4.125 -4.14453 -4.13672 -4.10547 -4.12109 -6.19922 -4.11328 -4.61328 -4.125 -4.14062 -4.11719 -4.14062 -4.11719 -4.12891 -4.12109 -4.12109 -4.14453 -4.08984 -4.09766 -4.10547 -4.14453 -4.11719 -4.11719 -4.12891 -4.12891 -4.13281 -4.11328 -4.13672 -4.09766 -4.13672 -4.125 -4.11328 -4.10547 -4.11719 -4.14844 -4.11328 -4.14453 -4.13281 -4.12109 -4.10547 -4.14062 -4.10547 -4.14453 -4.11719 -4.11328 -4.13672 -4.125 -4.13672 -4.11719 -4.11719 -4.13281 -4.10547 -4.10547 -4.11328 -4.08203 -4.11328 -4.125 -4.125 -4.13281 -4.14453 -4.09766 -4.10547 -4.10547 -4.13672 -4.11719 -4.125 -4.13281 -4.10547 -4.11328 -4.125 -4.09766 -4.08984 -4.15625 -4.11719 -4.15234 -4.14844 -4.11328 -4.08203 -4.12891 -4.11719 -4.12109 -4.13281 -4.12109 -4.14844 -4.11328 -4.13281 -4.11328 -4.13672 -4.14062 -4.13281 -4.08203 -4.11719 -4.12891 -4.08984 -4.16016 -4.11719 -4.07422 -4.10547 -4.11719 -4.14062 -4.15234 -4.12109 -4.14453 -1.07324 -3.62109 -3.86719 -3.77734 -4.48047 -1.49023 -1.66211 -2.66602 -5.67578 -2.98438 -4.66406 -2.10547 1.69141 -1.63477 -3.11133 -3.4043 -0.387939 -4.53516 -6.19141 -6.1875 -5.22656 -4.08594 -2.99609 -2.45898 -2.29688 -5.84766 -2.66016 -6.15234 -1.90723 -4.03125 -5.35938 0.780762 -3.31836 -2.62695 -2.77734 -4.46875 -2.45312 -4.86328 -6.10938 -0.964844 -1.5332 -2.47266 -3.85938 -4.71875 -1.81055 2.75195 -0.764648 -6.00781 0.927734 -6.78125 -7.08203 -0.503906 0.166626 -3.86328 -2.63672 -1.32031 -0.80957 -4.70312 -3.45312 -0.811523 0.103271 -4.58203 -1.87012 -5.76562 -5.04297 -4.10547 -4.09766 -1.74023 -4.11719 -1.80078 -2.37109 -2.31445 -3.02539 -4.00391 -4.63281 -3.69922 -2.50781 -1.0127 -2.99609 -2.89648 -2.3418 -3.49219 -2.38867 -5.15234 -3.39453 -3.85352 -1.41699 -1.53027 -2.81836 -2.80664 -1.80078 -1.32227 -3.01953 -3.73438 -2.98047 -2.50977 -2.79102 -0.907227 -4.39062 -0.955078 -0.462402 0.169434 -0.181274 1.95215 -2.82031 -0.0273438 2.41211 -2.90625 -2.06836 -5.16016 -2.81055 -4.11328 -3.43164 1.24121 -2.48047 -2.82031 -3.44141 -1.66309 -4.13672 -4.10547 -4.16797 -4.12891 -4.11328 -4.11719 -4.12109 -4.13281 -4.15234 -4.12891 -4.09766 -4.26562 1.125 1.51465 -4.09375 -1.68652 -1.7627 0.76416 -1.76855 1.38086 0.515137 -1.62695 0.525391 0.232544 0.914551 -1.41211 -1.62402 -1.39062 -0.156616 0.0297852 2.91797 0.393555 1.9541 2.68164 -2.33594 0.412598 -0.88916 0.849121 3.37891 -2.50781 2.36719 1.18555 2.77539 3.56445 2.02344 0.189331 -1.50293 -0.179199 0.103455 -1.89746 1.59668 -0.558594 -2.58398 0.927246 -0.194946 1.7998 0.309814 0.42627 1.09766 2.65234 -1.75781 -4.62891 1.375 -2.28516 -0.136719 -2.31055 0.256104 0.998047 1.27148 2.27734 2.6875 3.3418 -3.94727 0.991699 -1.88477 0.747559 1.30566 -1.82422 -1.52246 -1.44824 0.184692 0.612793 1.74219 0.972656 1.00879 -2.59961 -1.76562 -0.925781 -5.89062 1.52246 -1.77539 3.7207 -3.20898 -4.5 -0.766602 -1.09473 -0.727539 2.61914 2.32227 3.38086 -0.686035 0.330322 0.0643921 -3.96484 -1.6543 0.791992 -0.239624 0.123718 1.75879 0.440674 1.10742 5.01953 1.13672 1.63086 3.21484 -3.07812 -2.71094 0.459961 -1.11426 5.89453 1.75293 -1.66992 2.74609 1.75098 -7.3125 -3.71484 0.982422 -0.10968 1.00195 -2.18555 3.32812 -1.27148 0.922363 -0.789062 -6.19922 1.27539 -1.53223 -4.23438 3.3457 -6.35938 3.29102 2.125 3.45703 3.09766 -4.42578 -1.59473 0.231445 -1.47363 -0.0170898 2.62695 -0.607422 4.59766 -1.29688 -0.409668 4.23438 -2.19336 -0.37793 -2.68945 -4.54688 0.55957 -5.70312 -2.78906 -1.41016 -2.24219 -1.18262 0.198975 -2.63086 -3.05469 -2.10156 -3.34766 -3.00586 2.19141 1.78027 -0.98877 2.51562 -2.20117 0.94043 -2.82031 -2.00391 -0.455811 3.16016 4.17188 -1.92578 -0.180542 1.84473 -0.26416 -0.493896 2.79883 -3.1582 -1.80176 -0.869629 -3.75 -2.60352 -0.0863037 -3.60156 -2.14258 -2.78906 3.27344 -1.45117 -2.00195 2.02539 -0.890625 -0.0408936 0.33252 -2.15234 -2.88867 -0.416992 5.35547 -3.36133 -0.19043 -0.318604 -0.67041 -8.0625 -0.990723 -2.35742 -0.716309 -2.92773 -6.16016 -1.58496 -1.52539 -0.351074 -1.70117 -4.22266 1.88574 1.35254 0.435791 -2.7207 -2.90234 -2.91797 -3.27539 0.380127 0.716309 -2.53711 0.784668 1.75586 -0.833984 -0.459961 0.619629 -1.10742 -1.67188 2.44531 -0.929199 0.439453 -2.1582 -1.78711 -0.53418 0.876953 0.414795 -1.3125 0.554688 1.03516 -1.12402 -2.41016 -4.59375 -2.61719 -5.42578 -1.36816 -4.3125 -1.77148 -10.1562 -5.38672 -0.949219 0.0123291 -2.51172 1.20898 -4.35156 0.47583 1.78711 -1.87305 -0.364746 0.0441284 -1.03125 -0.238525 -0.391357 -1.41504 1.54199 1.36816 -1.96387 0.0406189 1.69727 0.391113 -2.62695 -1.57129 -0.483887 -4.39453 -1.74609 0.0227051 1.54883 -2.86523 -2.77148 0.234009 1.89844 -1.80176 2.24609 2.66211 1.58301 0.434082 -4.84375 -0.602539 -6.04688 0.404053 -2.58594 3.83203 -4.54297 -3.47852 -0.636719 -1.47949 -2.51953 -2.63086 2.51953 3.01758 0.57959 5.32812 -1.88281 -0.182007 -2.86914 -5.875 2.85938 1.18262 0.214844 -5.14062 -1.01172 2.0625 0.0170898 -1.36035 -0.314453 -2.45898 -2.91406 -0.728516 2.08203 -1.68652 -2.69141 -3.90234 -2.38086 0.243652 -2.09375 -2.42188 -0.994141 -0.5625 0.678223 -1.00879 -3.71875 -1.06152 -2.98242 -0.316895 0.949707 -4.60156 0.850586 0.794922 0.743652 -0.786621 0.206543 0.656738 -0.821289 5.09766 -1.65625 -1.45996 -2.75586 -1.46289 -4.01172 -1.29883 0.352539 4.69922 2.7793 1.23145 0.0596313 1.13867 -1.74414 -2.89648 -6.28516 -4.30078 0.509277 -0.116333 3.63086 -1.97461 -0.105469 0.236816 -3.71875 1.7832 0.130981 -0.293701 0.183716 1.73438 0.504883 -3.41406 0.456055 -0.14856 1.93262 -2.28516 -3.75586 1.13477 -3.32617 2.25 -0.595215 -1.52441 -3.48438 -1.84375 1.84668 -1.37891 0.749023 -5.81641 4.11719 2.34375 2.05273 -2.66992 -3.19531 0.71582 -1.57812 1.68652 -0.107849 -1.21191 0.723633 -0.15332 -1.21484 -2.95312 -0.00750732 0.074707 1.08496 2.51172 -0.543945 -0.850098 -0.309082 -2.47852 0.82373 -0.30542 1.61133 0.519531 -1.96875 -1.1084 -2.89453 -0.095459 -1.38672 0.137451 -3.07812 2.73828 -3.31055 -1.99805 3.13086 0.701172 1.20508 -4.16016 1.54883 -2.08984 -1.58105 -0.593262 1.64453 -1.05469 -6.11328 -4.16797 1.55469 -0.0996094 -2.81055 -4.79297 2.68164 -2.58594 -6.44141 -0.349609 -0.455322 -2.49023 0.074707 -2.125 -0.371582 1.38867 2.30664 1.69141 0.000915527 -3.88477 0.216309 -0.464844 -3.77344 -1.00293 -2.95117 3.34375 -1.10547 2.4082 -0.250244 -2.53516 -0.682129 0.22229 1.67188 -1.25391 -1.52441 -0.232178 -2.02734 -0.887695 -1.51172 -0.882812 -2.61328 0.983887 -0.668945 -3.875 -5.47266 -2.0625 2.28125 -0.78125 -0.227051 -3.50391 -5.16016 4.39844 0.187744 5.25391 -4.15234 -2.24219 -0.80127 -4.67969 -3.38086 -0.875 -3.25586 -0.620605 -5.18359 1.01855 2.59961 -2.03125 0.242798 -2.20312 -1.53027 -3.29688 -2.89844 1.88965 -2.34375 -2.19727 0.662598 -6.08594 -1.38477 -4.38281 2.26367 -1.06836 0.489746 -0.767578 -1.91016 -3.74414 -1.83496 1.49316 2.44922 1.14648 2.23438 -1.66309 -3.79883 0.72998 -1.80664 -2.04297 -2.13281 2.1543 0.142212 -1.42383 1.94238 -2.2207 -1.9375 -2.16211 0.432861 -5.5625 -4.21875 -0.00601196 -4.71094 2.66016 -0.576172 -1.2002 -1.69824 1.4209 -3.10742 2.5332 2.41016 -1.18359 -1.44629 2.60938 0.693848 1.52246 -2.85352 0.24585 -0.420898 -3.18945 -3.47461 -3.13672 -0.206421 -0.649902 4.01953 0.75293 -0.179688 -0.124023 -2.10742 1.16309 -5.42188 0.347656 -2.23242 -2.30273 -0.587891 0.953613 -3.73047 -0.795898 -0.277832 -2.02734 0.230591 2.07031 -3.67188 -0.680664 1.58887 -1.93066 -3.45312 -2.96094 -2.67383 -0.89209 -2.91406 -4.22656 -0.54834 3.51562 0.202637 -0.577148 0.878906 -0.557617 0.794922 -2.04297 -5.19141 1.03223 -0.0549927 0.695312 0.785156 -1.36816 1.42676 0.433594 -4.19531 0.47876 -1.375 -2.68359 -1.82617 -2.92578 -2.30469 -5.30859 1.1123 0.52832 -2.56641 -0.852539 0.162964 -2.54492 -2.76367 -3.2832 0.522461 0.586914 -3.31641 -2.34961 2.57422 3.5332 -3.61914 -5.68359 -0.267578 -0.924805 -3.30078 -4.22656 -1.22559 2.0918 0.467285 -3.87305 -4.84375 -3.24414 0.160156 -0.950195 -0.326416 -5.50391 0.686523 3.19727 -3.92969 -3.03516 -1.71582 1.99121 -0.937012 -1.56641 1.78516 -0.06073 -0.128174 -0.220093 0.195679 3.61133 -4.41797 0.213623 2.32812 -1.92188 -3.58789 1.63672 -2.43555 0.426758 -2.70703 -1.39941 -2.29883 0.671387 -0.966309 3.74609 1.99609 -1.13477 -0.0583496 -3.61719 1.46484 0.584473 2.85156 -3.19141 -0.0786133 -2.03711 -1.51367 1.34082 -0.0135498 -0.208008 -7.35547 -0.663086 -1.42969 -1.19141 -2.37109 -4.375 -0.384033 -1.30371 0.739258 -2.39258 -1.21973 -1.13867 -3.79102 0.619629 -0.47168 1.14648 -3.58008 -2.5918 -0.320557 -0.655762 -5.77734 -1.70801 -2.73828 -4.13672 0.57373 2.51172 4.57031 -0.336914 -1.63281 1.0293 1.17676 -5.30078 1.10156 1.7959 -3.29297 1.11914 2.99023 -2.73438 -2.40039 5.03516 -4.73438 -0.481689 -2.46875 -0.947754 0.104004 0.912598 -0.313477 -0.79248 -2.31055 0.132202 -1.10547 0.871094 -3.62305 1.50879 0.538086 -4.74609 1.38184 -6.38281 -1.38867 0.118896 -3.26562 0.412354 -2.61914 -0.800781 -2.89844 3.38867 -4.44141 -0.980957 -0.175293 0.680664 -2.19141 -2.69141 -3.14648 1.79492 -4.32031 0.619141 0.713379...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...]]

ScottTodd commented 8 months ago

I'm trying to get up to speed here. A few things stand out:

This issue could be rewritten differently to focus on project-specific issues. We can add some documentation and expand on bug report templates to help there.
- Context like "while generating new tokens we append the predicted tokens in the first forward pass" can help when working at the Python frontend level, but by the time the program gets to IREE itself, we mostly care about the input .mlir/.mlirbc file, flags used to compile, and any runtime code (C/Python/tools/etc.) used to load and run the compiled program. The current bug report issue template has a deliberate section for "Additional context" - please use that.
- Error messages should be the full text, either inlined in the issue, attached as a file, linked to logs on a CI job, or linked to in a GitHub Gist or other shared document. Partial screenshots are not very useful.
- We need version information for each dependency (IREE, PyTorch, SHARK-Turbine, etc.), and possibly information about the systems used (Windows/Linux, what CPU/GPU). A comment here mentions some fixes, but anyone trying to reproduce this needs to pick versions to use. If I just pick the latest stable versions, I would want to know if the fixes mentioned are included or not.
It seems like there are several possibly related issues here that could be split out. A meta-issue tracking the larger effort is one thing, but a single "error while converting" issue that has a mix of comments about compilation errors, runtime errors, and framework questions is really difficult to follow.
For "it didn't seem to progress further so I had to kill the process" we've had a few ideas on https://github.com/openxla/iree/issues/14369. One feature landed recently where you can run with --mlir-print-skip-regions --mlir-print-ir-before-all (or similar) to get a semi-reasonable amount of logs showing what the compiler is working on and if it is stalled somewhere.

ScottTodd commented 8 months ago

Regarding version info, I'm trying to reproduce on my Windows dev machine now, and a few deps were missing. Please provide specific reproduction instructions for issues if you can.

λ python -m venv .venv
λ .venv\Scripts\activate.bat
(.venv) λ python -m pip install shark-turbine
(.venv) λ python -m pip show shark_turbine | grep Version
Version: 0.9.3
(.venv) λ python -m pip show torch | grep Version
Version: 2.1.2

(.venv) λ python .\export_chatglm2.py > export_chatglm2_output.txt
Traceback (most recent call last):
  File "D:\dev\scratch\iree_2024_01_18\export_chatglm2.py", line 1, in <module>
    from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
ModuleNotFoundError: No module named 'transformers'

(.venv) λ python -m pip install transformers
(.venv) λ python -m pip show transformers | grep Version
Version: 4.36.2

(.venv) λ python .\export_chatglm2.py > export_chatglm2_output.txt
Traceback (most recent call last):
  File "D:\dev\scratch\iree_2024_01_18\export_chatglm2.py", line 3, in <module>
    import torch_mlir
ModuleNotFoundError: No module named 'torch_mlir'

Does SHARK-Turbine include torch_mlir somehow? Should I even try installing torch_mlir on its own, or could that lead to conflicts?

~~Also, instructions on https://github.com/llvm/torch-mlir look to be outdated, since Windows works too?~~ ~~> At the time of writing, we release pre-built snapshot of torch-mlir for Python 3.11 on Linux and macOS.~~ (edit: sent https://github.com/llvm/torch-mlir/pull/2771 to tweak those instructions)

(.venv) λ python -m pip install torch-mlir -f https://llvm.github.io/torch-mlir/package-index/
(.venv) λ python -m pip show torch_mlir | grep Version
Version: 20240118.1087

(.venv) λ python .\export_chatglm2.py > export_chatglm2_output.txt
D:\dev\scratch\iree_2024_01_18\.venv\Lib\site-packages\transformers\utils\generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
Traceback (most recent call last):
  File "D:\dev\scratch\iree_2024_01_18\export_chatglm2.py", line 12, in <module>
    from shark.shark_downloader import download_public_file
ModuleNotFoundError: No module named 'shark'

Looks like I need shark from https://github.com/nod-ai/SHARK too?

AmosLewis commented 8 months ago

@ScottTodd you need shark.venv from https://github.com/nod-ai/SHARK not SHARK-Turbine env. The Shark-Turbine us the torch-mlir as subproject in iree so it won't need to torch-mlir python package installed. But the export_chatglm.py was developed on SHARK, which use torch-mlir and iree python package separately when it was designed.

To initialized the shark.venv on ubuntu, use

git clone https://github.com/nod-ai/SHARK
cd SHARK
PYTHON=python3.11 ./setup_venv.sh
source ./SHARK/shark.venv/bin/activate

ScottTodd commented 8 months ago

Still trying to repro, and version information is still needed. SHARK's setup instructions are confusing... there are multiple paths through them and the venv setup scripts are not portable across operating systems or directories:

setup_venv.sh fails for several reasons on my Windows machine under bash with https://cmder.app/. I don't have 'python3', just 'python'. My 'python' path has spaces in it (C:\Program Files\Python311\python.exe). Venv activation with source "$VENV_DIR/bin/activate" does not work on Windows (run $VENV_DIR\Scripts\activate.bat instead).
setup_venv.ps1 looks like it needs to run in the source directory, but I'd rather set up a virtual environment in an external temp dir...

benvanik commented 8 months ago

Probably worth asking/tracking that on the shark project.

ScottTodd commented 8 months ago

Got a venv setup and tried running the script...

D:\dev\projects\SHARK\shark.venv\Lib\site-packages\torch\utils\_pytree.py:255: UserWarning: <class 'torch.Size'> is already registered as pytree node. Overwriting the previous registration.
  warnings.warn(
Traceback (most recent call last):
  File "D:\dev\scratch\iree_2024_01_19\export_chatglm2.py", line 121, in <module>
    ts_graph = import_with_fx(
               ^^^^^^^^^^^^^^^
  File "D:\dev\projects\SHARK\shark\shark_importer.py", line 697, in import_with_fx
    from brevitas_examples.llm.llm_quant.sharded_mlir_group_export import (
  File "D:\dev\projects\SHARK\shark.venv\Lib\site-packages\brevitas_examples\llm\llm_quant\sharded_mlir_group_export.py", line 58, in <module>
    from brevitas_examples.llm.llm_quant.mlir_custom_mm import brevitas_matmul_rhs_group_quant_library
  File "D:\dev\projects\SHARK\shark.venv\Lib\site-packages\brevitas_examples\llm\llm_quant\mlir_custom_mm.py", line 12, in <module>
    from torch_mlir.dialects.torch.importer.jit_ir.build_tools.registry import \
ModuleNotFoundError: No module named 'torch_mlir.dialects.torch.importer'

Full logs here: https://gist.github.com/ScottTodd/9c4c62170ea7f1be5088def18cf553ea

Can someone who ran this extract and share the .mlir, or at least provide specific repro instructions? Ideally repro instructions would use published Python packages... these local builds are really unstable and difficult to pin down across systems.

AmosLewis commented 8 months ago

Got a venv setup and tried running the script...

D:\dev\projects\SHARK\shark.venv\Lib\site-packages\torch\utils\_pytree.py:255: UserWarning: <class 'torch.Size'> is already registered as pytree node. Overwriting the previous registration.
  warnings.warn(
Traceback (most recent call last):
  File "D:\dev\scratch\iree_2024_01_19\export_chatglm2.py", line 121, in <module>
    ts_graph = import_with_fx(
               ^^^^^^^^^^^^^^^
  File "D:\dev\projects\SHARK\shark\shark_importer.py", line 697, in import_with_fx
    from brevitas_examples.llm.llm_quant.sharded_mlir_group_export import (
  File "D:\dev\projects\SHARK\shark.venv\Lib\site-packages\brevitas_examples\llm\llm_quant\sharded_mlir_group_export.py", line 58, in <module>
    from brevitas_examples.llm.llm_quant.mlir_custom_mm import brevitas_matmul_rhs_group_quant_library
  File "D:\dev\projects\SHARK\shark.venv\Lib\site-packages\brevitas_examples\llm\llm_quant\mlir_custom_mm.py", line 12, in <module>
    from torch_mlir.dialects.torch.importer.jit_ir.build_tools.registry import \
ModuleNotFoundError: No module named 'torch_mlir.dialects.torch.importer'

Full logs here: https://gist.github.com/ScottTodd/9c4c62170ea7f1be5088def18cf553ea

Can someone who ran this extract and share the .mlir, or at least provide specific repro instructions? Ideally repro instructions would use published Python packages... these local builds are really unstable and difficult to pin down across systems.

Fixed here https://github.com/llvm/torch-mlir/issues/2730#issuecomment-1896442202

ScottTodd commented 8 months ago

I'm not sure I'd call a local patch to a .venv folder a "fix"... is there a stable version somewhere I could use? (Also, again - please share the .mlir and/or more specific repro instructions, this is a ton of back and forth for basic issue reporting and triage).

kuhar commented 8 months ago

If you have the shark.venv set up and activated already, you can extract the version information with:

pip list | grep iree-

printf "shark SHA: %s\n" "$(git log --pretty=format:'%H' -n 1)"
printf "iree-compile SHA: %s\n" "$(python -c "import iree.compiler.version as v; print(v.REVISIONS['IREE'])")"
printf "iree-runtime SHA: %s\n" "$(python -c "import iree.runtime.version as v; print(v.REVISIONS['IREE'])")"

iree-org / iree