Open cglagovichTT opened 4 days ago
fyi @avoraTT @mikevin920
@cglagovichTT for FF1, should we modify Eltwise-binary op to take Silu as input activation?
Yes @yugaoTT that would help a lot. I'd like to be able to do mul
with SILU
on in0
@kevinmiTT11 please update weight caches on CI machines. Then we should be good to merge dram sharded decode in.
Might be good to change name for these weights to avoid conflict
Decode passes with FF2, FF3 dram sharded weights. We are leaving FF1 as-is because dram-sharded FF1 is slower than before due to SILU activation.
There is an issue with prefill matmul2D with dram sharded weights when in0 is batched and interleaved. @yugaoTT is looking into it.
branch:
cglagovich/9642
repro: