microsoft / knossos-ksc

Compiler with automatic differentiation
Other
45 stars 10 forks source link

Allow pytest benchmark to measure with or without CUDA transfer times #854

Closed cgravill closed 3 years ago

cgravill commented 3 years ago

I've tried separating out the setup / transfer times in a few ways that ended up being messy. Here's an approach that solves the specific problem in I think a reasonable way.

It pulls all device management responsibility out of the example. Speculatively move all results to CPU, it won't hurt the tensors already there and time wasted should be minimal - and not affect any benchmark numbers.

Thoughts?

AB#19085

cgravill commented 3 years ago

Windows / Surface Book 2 GPU

image