Open yipengLeo opened 1 month ago
@yipengLeo thanks for pointing this out, there should be two add operations per block in llama models as far as i remember, let me look into this
Hi @yipengLeo, there is indeed a bug here. However, the impact is relatively little as the add_time
is a small percentage of the time.
For all the models supported right now except phi-2
, there are two add operations in each layer.
(only phi-2
has post_attn_norm: false
in its model config)
1st add: https://github.com/microsoft/vidur/blob/main/vidur/profiling/mlp/mlp_impl.py#L172
2nd add: https://github.com/microsoft/vidur/blob/main/vidur/profiling/mlp/mlp_impl.py#L179
So, if we ignore phi-2
, it would be better to add add_time twice. A proper solution would be to consider these slight differences between model architectures. Please feel free to raise a PR for fixing this!
I found the
_get_block_execution_time()
function in thevidur/entities/execution_time.py
path computes 'add_time' only once.But in other places, for example, under the
vidur/metrics/metrics_store.py
path. 'add_time' is calculated twiceIs this a bug?