Closed toelli-msft closed 2 years ago
Thanks! For sqrl, the 4x4 size is the targetted one, so this is great.
Thanks! For sqrl, the 4x4 size is the targetted one, so this is great.
On that point, do we really want to benchmark "vsqrl
"? https://github.com/microsoft/knossos-ksc/issues/966
Shall we work to merge this?
Shall we work to merge this?
Yes, the current version is a good start. We can increase the problem sizes during or after the implementation of vsqrl
.
What we actually want here is vsqrl
so I suggest we wait until vsqrl
can be implemented via https://github.com/microsoft/knossos-ksc/pull/1010 and then implement vsqrl
in C++ rather than just sqrl
.
vsqrl is now in -- what should next steps be here?
The next step is to write vsqrl
in C++. I'm not sure if I'll get to that, I'm afraid.
No problem, this should go in anyway.
The handwritten C++ is beating or roughly matching PyTorch in all cases except backwards on large sizes.