Fix GPU-CPU device mismatch error in util filter_dilated_rows

tklausen commented 4 months ago

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Docs change / refactoring / dependency upgrade

Motivation and Context / Related issue

The function filter_dilated_rows in tensor_utils.py converts a tensor to an ndarray, modifies the ndarray, and converts the modified ndarray back to a tensor.

Bug: If the original tensor is not on the CPU, the conversion to ndarray will fail because tensor.cpu() is not called.

File "opacus/utils/tensor_utils.py", line 328, in filter_dilated_rows
    tensor_np = tensor.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Fix: This PR directly modifies the tensor without ever converting it to an ndarray. This fixes the bug and is more efficient than the original implementation.

How Has This Been Tested (if it applies)

Manually tested with the example provided in the function's DocString.

Also, filter_dilated_rows is called if the dilation of a 3d convolution is not 1. Thus, this function is implicitly tested by tests/grad_samples/conv3d_test.py.

Checklist

[x] The documentation is up-to-date with the changes I made.
[x] I have read the CONTRIBUTING document and completed the CLA (see CONTRIBUTING).
[x] All tests passed, and additional code has been covered with new tests.

facebook-github-bot commented 4 months ago

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 4 months ago

This pull request has been merged in pytorch/opacus@32a465bfa980777e60a763616c83429785aa2ac6.

karthikprasad commented 4 months ago

Thanks for the fix @tklausen :)

pytorch / opacus