microsoft / knossos-ksc

Compiler with automatic differentiation
Other
45 stars 10 forks source link

relu3 upper bounds with compiler flags (benchmark combinations report) #942

Closed toelli-msft closed 2 years ago

toelli-msft commented 3 years ago

This allows to set compiler flags for the compilation of embedded C++ and KS benchmarks. See https://github.com/microsoft/knossos-ksc/pull/942#issuecomment-880665282 for up-to-date benchmark numbers. Most important feature of benchmark: with the right compiler flags if is as fast as masking (and they're both a bit faster than PyTorch)!

This PR introduces the CFlags class for bundling together compiler flags we might feed to gcc or cl. It's a bit hokey but seems to strike a good balance between simplicity and robustness. An alternative could be to pass around flags for both compilers separately, but that's rather messy and not robust to adding a new compiler in the future.

Old

As of 207d412f6b8885dfa8a9272443d6929eff1f6c91 we get a 10%-30% improvement by using compiler flags (-march=native- -funroll-loops -ffast-math -mprefer-vector-width=512. I don't (yet) know which specific flags here make the difference.).

image

image

image

toelli-msft commented 3 years ago

[Now rebased]

toelli-msft commented 3 years ago

Interestingly, the version using if is as fast as the version using masking, when these new compiler flags are on (c8aea069c857c089bd916e6c6d77bd69a004cd75)

image

image

image

awf commented 3 years ago

Cool! Do we think that float is more than 2x the speed of double? I realize that's not easily checkable from here, just wondering.

toelli-msft commented 3 years ago

Looks like it, based on the difference between these two (from https://github.com/microsoft/knossos-ksc/pull/877). PyTorch backwards goes from 3.4ms on float64 to 1.3ms on float32, which is a bit more than 2x faster. That would be consistent with vectorisation.

float64

https://user-images.githubusercontent.com/51626669/125292570-7ebe7d00-e31a-11eb-85c8-395b7e05d90b.png

float32

https://user-images.githubusercontent.com/51626669/125289029-aca1c280-e316-11eb-91ad-2d282e458e91.png

toelli-msft commented 2 years ago

We still have great performance post https://github.com/microsoft/knossos-ksc/pull/931/files

awf commented 2 years ago

We still have great performance post https://github.com/microsoft/knossos-ksc/pull/931/files

Tables updated above?

toelli-msft commented 2 years ago

Tables updated above?

I'm not sure what you mean, but if you're asking whether the tables above have been updated to show performance after https://github.com/microsoft/knossos-ksc/pull/931/files was merged, then no, they haven't been updated. Here are such tables though:

(86af4427236297713424a21f88c5806279c31686)

image

image

image