a question about the time consumed by copying tensor from cpu to gpu

Hi all, I am trying to add lighthead module on faster rcnn. I add a new light_branch on faster_rcnn_heads.py which conbines the functions of box_head and box_out. This module needs to move rois in rpn_return from cpu to gpu but I find this operation much slowly than the roi_feature_transform in module_builder.py (0.04s vs <1e-3s). Then I move these operations on module_builder.py, and the time is much less, but the time of get restore_bl increased to 0.04s. This question puzzled me for a long time and could any one figure it out?

roytseng-tw / Detectron.pytorch

a question about the time consumed by copying tensor from cpu to gpu #215