Currently, the 'thread_stride' of NestedLayoutAttr is misinterpreted as the access stride of multi-dimensional vector.
However, it turns out it correspond to tid -> vtid mapping and the undistributed vector is packed as :
subgroup x batch x outer x thread x element
where vtid is used to index 'thread' dimension.
Therefore, this commit removes the usage of 'thread_stride's and 'subgroups_stride' when calculating the base constant offset and rather obtain them from packed undistributed vector shape.
Currently, the 'thread_stride' of NestedLayoutAttr is misinterpreted as the access stride of multi-dimensional vector.
However, it turns out it correspond to tid -> vtid mapping and the undistributed vector is packed as : subgroup x batch x outer x thread x element where vtid is used to index 'thread' dimension.
Therefore, this commit removes the usage of 'thread_stride's and 'subgroups_stride' when calculating the base constant offset and rather obtain them from packed undistributed vector shape.