microsoft / tensorflow-directml-plugin

DirectML PluggableDevice plugin for TensorFlow 2
Apache License 2.0
185 stars 25 forks source link

Fix memory leak in SegmentReduction ops and improve performance #279

Closed PatriceVignola closed 2 years ago

PatriceVignola commented 2 years ago

The RAII wrapper that we used to deallocate the output of the SegmentReduction ops that were run on the CPU through the Eager API wasn't actually deallocating the memory because we were passing a pointer that was always null. To fix this, we needed to pass a pointer to the pointer (TFE_TensorHandle**).

There's also a slight performance increase by adding the CopyDeviceTensorsToCPU function which allows kernels to copy more than one tensor at once and flush/sync only once, instead of doing it twice.