microsoft / tensorflow-directml

Fork of TensorFlow accelerated by DirectML
Apache License 2.0
454 stars 32 forks source link

Optimize output allocation for inputs that can be forwarded #387

Closed PatriceVignola closed 1 year ago

PatriceVignola commented 1 year ago

When in-place execution is possible (mostly for element-wise operators), forward one of the inputs if all following conditions exist:

  1. The input isn't used for another operator that hasn't yet been executed
  2. The input has the same datatype, shape and size as the output
  3. For composite operators, the input isn't used more than once. This is important since we're writing to the input/output in a non-deterministic fashion, so elements can't be relied upon more than once