nod-ai / sharktank

SHARK Inference Modeling and Serving
Apache License 2.0
7 stars 9 forks source link

[model] Avoid const folding extsi ops on weights #80

Closed antiagainst closed 2 weeks ago

antiagainst commented 2 weeks ago

With constant weight and extsi consuming them, we should not fuse those extsi ops to undo the quantization at the model level.

hanhanW commented 2 weeks ago

What is the priority? I finally get some time back to data-tiling work; I wonder when do I context switch to this issue. It's been a while since I touched the code, so it'd take some time to learn what's going on today.

antiagainst commented 2 weeks ago

High. It's blocking compiling the full model now. I assigned to you because I only recall that you were working on it previously. :D Feel free to re assign to somebody else more familiar and recently touching it!

antiagainst commented 2 weeks ago

We can just use --iree-opt-const-eval=false to disable this for now as @MaheshRavishankar mentioned. So closing.