Closed salilmishra23 closed 4 years ago
I think the culprit is this:
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
That ops is a C++ implemented op, which requires storage.
Is there any workaround or fix for this?
One way I believe, would be to sync back the tensors to CPU instead of PyTorch/XLA TPU and then running nms.
To add an example:
device = boxes.device # TPU device that it's originally in.
xm.mark_step() # materialize computation results up to NMS
boxes_cpu = boxes.cpu().clone() # move to CPU from TPU
scores_cpu = scores.cpu().clone() # ditto
keep = torch.ops.torchvision.nms(boxes_cpu, scores_cpu, iou_threshold) # runs on CPU
keep = keep.to(device=device) # send back to TPU so that rest of model pass is run on TPU
Note that this will likely be slow due to TPU<>CPU communication.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
❓ Questions and Help
Thanks for the great package!
I get the following error when trying to train Faster RCNN on TPU. The code works for GPU.
I'm providing the link to the Colab. https://colab.research.google.com/drive/1ShGj4Uq8eFgXE1jqfzH9-v1UkaKKLuHq?usp=sharing