Closed PhaneeshB closed 1 year ago
please attach a link to the .mlir and iree command line to execute / recreate it
+1. It would be much easier for me to look into the issue with an input mlir file. Also as @stellaraccident asked in the other issue, is this specific to M2? (I'd suspect not but need to double check.)
please attach a link to the .mlir and iree command line to execute / recreate it
Command :
<PATH TO ..../iree-compile> - --iree-input-type=mhlo --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --iree-llvm-embedded-linker-path=<PATH TO ..../iree-lld> --mlir-print-debuginfo --mlir-print-op-on-diagnostic=true --iree-llvm-target-cpu-features=host --iree-mhlo-demote-i64-to-i32=false --iree-flow-demote-i64-to-i32 -iree-vulkan-target-triple=m1-moltenvk-macos --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64
Input MLIR: https://storage.googleapis.com/shark_tank/dbmdz_convbert-base-turkish-cased_tf/dbmdz_convbert-base-turkish-cased_tf.mlir
@antiagainst We checked and found that this issue is also present on M1 Vulkan as suspected
We are seeing the issue of different results with and without --iree-flow-trace-dispatch-tensors
again:
local-task:
1x16x32000xf32=[[0.607319 -1.07316 0.898614 -0.267287 1.78744 -0.263523 1.01242 -0.2313 -2.19909 -2.82577 -2.44984 0.527114 -0.46196 0.275833 -1.16742 -0.420368 ...
vulkan (w/ tracing):
1x16x32000xf32=[[0.607278 -1.0732 0.898576 -0.267251 1.78746 -0.263542 1.01245 -0.231225 -2.19902 -2.82572 -2.44985 0.52708 -0.461983 0.275756 -1.1674 -0.420419 ...
vulkan (w/o tracing):
1x16x32000xf32=[[-0.698902 -1.57589 1.41733 0.851334 1.94573 -0.392987 1.02575 -0.61025 -3.48231 -3.52306 -1.55764 -0.271172 -0.189102 -0.334609 -0.209776 0.191701
With it the result is correct. Last time it was gone but I guess we just got lucky. Still need to root cause it properly.
You can try compiling with --iree-stream-partitioning-favor=debug
which disables all concurrency and puts a barrier between each dispatch - that'd narrow down whether it was multiple dispatches stomping on each other or something host/device.
Closing this for now given this is Vulkan on MoltenVK -- we have native Metal support and that's the way forward.
What happened?
On comparing the results obtained from TensorFlow with SHARK results the difference is more than the tolerance range. Following is the error message shown:
Steps to reproduce your issue
The error can be reproduced using the following script:
What component(s) does this issue relate to?
No response
Version information
No response
Additional context
m1-moltenvk-macos
target triple on Apple M2, please make the following change in the SHARK source code file : https://github.com/nod-ai/SHARK/blob/d556c0d6ef8f69b32bc3b2d28165345dd2faf403/shark/iree_utils/vulkan_utils.py#L23replace :
if vulkan_device == "M1":
with :if vulkan_device == "M1" or vulkan_device == "M2":