Closed TroyGarden closed 2 months ago
This pull request was exported from Phabricator. Differential Revision: D58906839
Name | Link |
---|---|
Latest commit | ea1ca6a1de3721543d2fae45f22d8cc341c68f59 |
Latest deploy log | https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6676eeb209af39000818affb |
This pull request was exported from Phabricator. Differential Revision: D58906839
This pull request has been merged in pytorch/FBGEMM@f8021eea2bb3da9baac31a45d16775368b876223.
Summary:
performance notes
The good:
_all_keys_used_once
is no longer needed_pin_and_move
for the meta data (arguments), it will be handled inside the operator, it's more friendly to tracing.The same bad:
permutes
,input_lengths
, andoutput_lengths
. Those tensors needs to be on the device so that the cuda kernels has access to it. b) [resolved] 2 lists of (scalar_t*) pointers, input and output tensor lists. c) [resolved] Didn't find a good way to let the kernel knows the address of the lists of input/output tensors, because the lists are also need to be on the device.benchmark
traces
Differential Revision: D58906839