Closed jeremyfowers closed 3 days ago
Temporary solution: mlagility.analysis
will use the model's weights to calculate the hash. That provides the workaround of giving the two models different weights, which would have otherwise been ignored.
Proposed long term solution: take the shape of model inputs into account when calculating the hash.
I agree that taking the shape of model inputs into account when calculating the hash.
Note: The temporary solution could be potentially dangerous, since the same model might be detected multiple times when weights are loaded/changed. Not sure if we have a way of avoiding that.
EDIT: I had the cause of the problem wrong in the original copy
For example, LLaMA with KV-caching enabled vs. disabled has the same compute graph, and therefore the same mlagility
hash
. However, those two invocations of LLaMA have completely different compute because the input shapes are completely different (the former has 1D inputs and the latter has 2D inputs). So they should have different mlagility hashes and appear separately in thebenchit
status.Giving the two ONNX graphs different model weights does not help the problem because weights are not considered in mlagility hashing.
cc @danielholanda @ramkrishna2910