mnn-llm 使用cuda 后端执行时，每次结果都是不一样的。

wangzhaode / mnn-llm

llm deploy project based mnn.

Apache License 2.0

1.46k stars 159 forks source link

Open BaofengZan opened 3 weeks ago

BaofengZan commented 3 weeks ago

模型使用了fuse attention ， cpu端结果正常，切换成cuda后，提示上面的问题，并且每次结果都是不同的。是因为是不支持的问题吗？

wangzhaode commented 3 weeks ago

CUDA后端还不支持fuse attention的算子

BaofengZan commented 3 weeks ago

好的。感谢

BaofengZan commented 2 weeks ago

另外，我导出mnn模型时，不选用fuseAttention，但是使用cuda后端时，运行多次，结果有时候对，有时候不对？这种可能是什么原因呢？