A pure, low-level tensor program representation enabling tensor program optimization via program rewriting. See the web demo at https://gussmith23.github.io/glenside-web-demo/
71
stars
10
forks
source link
Improve codegen for `access-transpose`, or optimize `access-transpose`s out #91
We end up with access-transposes in the extracted workload. Can we either find a way to avoid these, pre-transpose the data, or find some other way to compute them efficiently?
We end up with
access-transpose
s in the extracted workload. Can we either find a way to avoid these, pre-transpose the data, or find some other way to compute them efficiently?