ppengtang / pcl.pytorch

PyTorch codes for our papers "Multiple Instance Detection Network with Online Instance Classifier Refinement" and "PCL: Proposal Cluster Learning for Weakly Supervised Object Detection".
MIT License
253 stars 49 forks source link

Too long test time #45

Closed MRRRKING closed 3 years ago

MRRRKING commented 3 years ago

Thanks you for your codes. When I use the pytorch1.6.0 branch and default setting, the test time is very very long(~7days). My GPU is 1080Ti, and the training time is normal(~13hours). Have you ever encountered the same problem? And do you have any ideas? Thanks a lot.

ppengtang commented 3 years ago

Hi, I haven't met this problem before. I have asked my friends to help me test on their machines and the test time looks normal (2-3 hours to test on VOC 2007 test). Could you check whether you run multiple experiments on the same GPU at the same time?

U201714643 commented 3 years ago

Hi, I haven't met this problem before. I have asked my friends to help me test on their machines and the test time looks normal (2-3 hours to test on VOC 2007 test). Could you check whether you run multiple experiments on the same GPU at the same time?

Thanks you for your codes. But I also encountered this problem. My GPU is 3080, and the test time is also very very long(~36hours), while the training time is normal(~6hours). My Pytroch version is 1.7.0 with CUDA 11.1(Nvidia Driver Version : 455.38). And I used default setting. (Although I installed mmcv for CUDA 11). Are this problem related to CUDA or driver version? Thanks a lot.

ppengtang commented 3 years ago

Hi @MRRRKING @U201714643 , we have tested on V100 and 2080Ti GPUs. The testing speed looks reasonable (less than 3 hours on VOC 2007 test). Could you try to follow install.sh to set up environments? Could you also check the GPU utilization rate by nvidia-smi?

MRRRKING commented 3 years ago

Hi @MRRRKING @U201714643 , we have tested on V100 and 2080Ti GPUs. The testing speed looks reasonable (less than 3 hours on VOC 2007 test). Could you try to follow install.sh to set up environments? Could you also check the GPU utilization rate by nvidia-smi?

Yes, I followed the install.sh file. And the GPU utilization is 100%. I try to set the parameter TEST.BBOX_AUG.ENABLED to False, and the testing time is normal(~2 hours). I suspect the problem lies in the im_detect_bbox_aug function. https://github.com/ppengtang/pcl.pytorch/blob/dc16cfa840fbe65f558acce8fd43c67a4530afc3/lib/core/test.py#L136 Do you have any direction to solve the problem?

ppengtang commented 3 years ago

Hi @MRRRKING @U201714643 , we have tested on V100 and 2080Ti GPUs. The testing speed looks reasonable (less than 3 hours on VOC 2007 test). Could you try to follow install.sh to set up environments? Could you also check the GPU utilization rate by nvidia-smi?

Yes, I followed the install.sh file. And the GPU utilization is 100%. I try to set the parameter TEST.BBOX_AUG.ENABLED to False, and the testing time is normal(~2 hours). I suspect the problem lies in the im_detect_bbox_aug function.

https://github.com/ppengtang/pcl.pytorch/blob/dc16cfa840fbe65f558acce8fd43c67a4530afc3/lib/core/test.py#L136

Do you have any direction to solve the problem?

That's weird. Sorry I don't have 1080Ti GPUs and thus cannot reproduce the issue. Could you try to record the time cost of each part in im_detect_bbox_aug and the time cost of each line in these codes?

In addition, if TEST.BBOX_AUG.ENABLED is set to False, the test time will be reduced by about 10x, so the reasonable test time should be less than half an hour.

U201714643 commented 3 years ago

Hi @MRRRKING @U201714643 , we have tested on V100 and 2080Ti GPUs. The testing speed looks reasonable (less than 3 hours on VOC 2007 test). Could you try to follow install.sh to set up environments? Could you also check the GPU utilization rate by nvidia-smi?

Yes, I followed the install.sh file. And the GPU utilization is 100%. I try to set the parameter TEST.BBOX_AUG.ENABLED to False, and the testing time is normal(~2 hours). I suspect the problem lies in the im_detect_bbox_aug function. https://github.com/ppengtang/pcl.pytorch/blob/dc16cfa840fbe65f558acce8fd43c67a4530afc3/lib/core/test.py#L136

Do you have any direction to solve the problem?

That's weird. Sorry I don't have 1080Ti GPUs and thus cannot reproduce the issue. Could you try to record the time cost of each part in im_detect_bbox_aug and the time cost of each line in these codes?

In addition, if TEST.BBOX_AUG.ENABLED is set to False, the test time will be reduced by about 10x, so the reasonable test time should be less than half an hour.

Hi. I think this line might lead to long test time: https://github.com/ppengtang/pcl.pytorch/blob/dc16cfa840fbe65f558acce8fd43c67a4530afc3/lib/modeling/model_builder.py#L119 This is my way to measure its run time:

    torch.cuda.synchronize()
    start = time.time()
    ############################
    blob_conv = self.Conv_Body(im_data).contiguous()
    ############################
    torch.cuda.synchronize()
    end = time.time()
    print('blob_conv = self.Conv_Body(im_data).contiguous():',end-start,'s')

And This is the result:

    blob_conv = self.Conv_Body(im_data).contiguous(): 0.16710186004638672 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 0.22522902488708496 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 0.21841096878051758 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 0.9169421195983887 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 0.9236195087432861 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 2.9725072383880615 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 2.966435432434082 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 8.325863361358643 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 8.330979108810425 s
    blob_conv = self.Conv_Body(im_data).contiguous(): 0.15090179443359375 s
    INFO test_engine.py: 270: im_detect: range [1, 4952] of 4952: 1/4952 25.556s (eta: 1 day, 11:08:47)

Do you have any direction to solve the problem?

ppengtang commented 3 years ago

Could you try to add torch.cuda.empty_cache() after this line of codes? I'm not sure about the exact reason. I guess one possible reason is the GPU cache has not been released after testing each image.

ppengtang commented 3 years ago

Btw, could you also make sure to add CUDA_VISIBLE_DEVICES=0 at the beginning of the test command and do not use --multi-gpu-testing? There are some bugs in multi-gpu testing.

MRRRKING commented 3 years ago

I test the running time in this way: ` print(target_scale) print('*****') time4 = time() return_dict = model(**inputs) time5 = time() print('time5: ', time5 - time4)

cls prob (activations after softmax)

scores = return_dict['refine_score'][0].data.cpu().numpy().squeeze()

for i in range(1, cfg.REFINE_TIMES):
    scores += return_dict['refine_score'][i].data.cpu().numpy().squeeze()

scores /= cfg.REFINE_TIMES
# In case there is 1 proposal
scores = scores.reshape([-1, scores.shape[-1]])

time6 = time()
print('time6: ', time6 - time5)`

And the result is below: `480


time5: 1.396902084350586 time6: 0.05057382583618164 576


time5: 0.0050432682037353516 time6: 2.703824758529663 688


time5: 0.005361795425415039 time6: 4.799558162689209 864


time5: 0.005181312561035156 time6: 11.79275107383728 1200


time5: 0.0075037479400634766 time6: 27.75927186012268 `

From the test results, time is not spent on model prediction, but on data conversion.https://github.com/ppengtang/pcl.pytorch/blob/0896c82e2cef68c4c1a33078f863d98649ef30fe/lib/core/test.py#L109 I test the old codes which run on the pytroch 0.4.1 environment, and its testing time is normal. Is it related to the pytorch version?

U201714643 commented 3 years ago

Could you try to add torch.cuda.empty_cache() after this line of codes? I'm not sure about the exact reason. I guess one possible reason is the GPU cache has not been released after testing each image.

Hi, Thanks for your advice. But it seems not to work. And I am sure that I have added CUDA_VISIBLE_DEVICES=0 at the beginning of the test command and not used --multi-gpu-testing. Moreover, I think this problem might be related to vgg16. Because, when i =5 ,line 113 will take too much time to run. https://github.com/ppengtang/pcl.pytorch/blob/4c3cfc95807a6197b2d96afce37fe74d7975a456/lib/modeling/vgg16.py#L111-L114 For example:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  1 )(): 0.014625310897827148 s
Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  2 )(): 0.011127233505249023 s
Sequential(
  (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  3 )(): 0.015486001968383789 s
Sequential(
  (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): ReLU(inplace=True)
  (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  4 )(): 0.014803886413574219 s
Sequential(
  (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): ReLU(inplace=True)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  5 )(): 8.2745041847229 s
Sequential(
  (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
  (1): ReLU(inplace=True)
  (2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
  (3): ReLU(inplace=True)
  (4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
  (5): ReLU(inplace=True)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

And this is my way to measure its run time:

    print('-------------------------------------------------------------------')
    torch.cuda.synchronize()
    start = time.time()
    ############################
    x = getattr(self, 'conv%d' % i)(x)
    ############################
    torch.cuda.synchronize()
    end = time.time()
    print('x = getattr(self, \'conv%d\' % ',i,')():',end-start,'s')
    print(getattr(self, 'conv%d' % i))
    print('-------------------------------------------------------------------')

Besides, during testing, my GPU Utilization is about 95%, and VRAM usage is about 4500MB. Do you have any direction to solve the problem?

ppengtang commented 3 years ago

I test the running time in this way: ` print(target_scale) print('*****') time4 = time() return_dict = model(**inputs) time5 = time() print('time5: ', time5 - time4)

cls prob (activations after softmax)

scores = return_dict['refine_score'][0].data.cpu().numpy().squeeze()

for i in range(1, cfg.REFINE_TIMES):
    scores += return_dict['refine_score'][i].data.cpu().numpy().squeeze()

scores /= cfg.REFINE_TIMES
# In case there is 1 proposal
scores = scores.reshape([-1, scores.shape[-1]])

time6 = time()
print('time6: ', time6 - time5)`

And the result is below: `480

time5: 1.396902084350586 time6: 0.05057382583618164 576

time5: 0.0050432682037353516 time6: 2.703824758529663 688

time5: 0.005361795425415039 time6: 4.799558162689209 864

time5: 0.005181312561035156 time6: 11.79275107383728 1200

time5: 0.0075037479400634766 time6: 27.75927186012268 `

From the test results, time is not spent on model prediction, but on data conversion.

https://github.com/ppengtang/pcl.pytorch/blob/0896c82e2cef68c4c1a33078f863d98649ef30fe/lib/core/test.py#L109

I test the old codes which run on the pytroch 0.4.1 environment, and its testing time is normal. Is it related to the pytorch version?

Could you try to change these codes to the following codes?

    scores = return_dict['refine_score'][0].squeeze()
    for i in range(1, cfg.REFINE_TIMES):
        scores += return_dict['refine_score'][i].squeeze()
    scores /= cfg.REFINE_TIMES
    # In case there is 1 proposal
    scores = scores.view(-1, scores.size[-1]).data.cpu().numpy()

I don't think the issue is from pytorch version. On my GPUs, I could get correct results using pytorch 1.6.0

ppengtang commented 3 years ago

Could you try to add torch.cuda.empty_cache() after this line of codes? I'm not sure about the exact reason. I guess one possible reason is the GPU cache has not been released after testing each image.

Hi, Thanks for your advice. But it seems not to work. And I am sure that I have added CUDA_VISIBLE_DEVICES=0 at the beginning of the test command and not used --multi-gpu-testing. Moreover, I think this problem might be related to vgg16. Because, when i =5 ,line 113 will take too much time to run. https://github.com/ppengtang/pcl.pytorch/blob/4c3cfc95807a6197b2d96afce37fe74d7975a456/lib/modeling/vgg16.py#L111-L114

For example:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  1 )(): 0.014625310897827148 s
Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  2 )(): 0.011127233505249023 s
Sequential(
  (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  3 )(): 0.015486001968383789 s
Sequential(
  (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): ReLU(inplace=True)
  (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  4 )(): 0.014803886413574219 s
Sequential(
  (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): ReLU(inplace=True)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' %  5 )(): 8.2745041847229 s
Sequential(
  (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
  (1): ReLU(inplace=True)
  (2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
  (3): ReLU(inplace=True)
  (4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
  (5): ReLU(inplace=True)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

And this is my way to measure its run time:

    print('-------------------------------------------------------------------')
    torch.cuda.synchronize()
    start = time.time()
    ############################
    x = getattr(self, 'conv%d' % i)(x)
    ############################
    torch.cuda.synchronize()
    end = time.time()
    print('x = getattr(self, \'conv%d\' % ',i,')():',end-start,'s')
    print(getattr(self, 'conv%d' % i))
    print('-------------------------------------------------------------------')

Besides, during testing, my GPU Utilization is about 95%, and VRAM usage is about 4500MB. Do you have any direction to solve the problem?

That's weird... Could you try to change padding and dilation to (1, 1) for conv5 to see what will happen?

U201714643 commented 3 years ago

That's weird... Could you try to change padding and dilation to (1, 1) for conv5 to see what will happen?

Thanks for your advice. It works. Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.) In addition, My training time is about 6 hours, and testing time is about 1 hour. Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.

ppengtang commented 3 years ago

That's weird... Could you try to change padding and dilation to (1, 1) for conv5 to see what will happen?

Thanks for your advice. It works. Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.) In addition, My training time is about 6 hours, and testing time is about 1 hour. Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.

It seems to be the dilated convolution issue. Your model is trained with dilation 2 but tested with dilation 1, which results in inconsistency between training and testing and thus harms performance. Could you also try to train with dilation 1 to see the numbers?

U201714643 commented 3 years ago

That's weird... Could you try to change padding and dilation to (1, 1) for conv5 to see what will happen?

Thanks for your advice. It works. Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.) In addition, My training time is about 6 hours, and testing time is about 1 hour. Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.

It seems to be the dilated convolution issue. Your model is trained with dilation 2 but tested with dilation 1, which results in inconsistency between training and testing and thus harms performance. Could you also try to train with dilation 1 to see the numbers?

Thanks for your advice. But I have re-trained model with dilation 1.

ppengtang commented 3 years ago

That's weird... Could you try to change padding and dilation to (1, 1) for conv5 to see what will happen?

Thanks for your advice. It works. Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.) In addition, My training time is about 6 hours, and testing time is about 1 hour. Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.

It seems to be the dilated convolution issue. Your model is trained with dilation 2 but tested with dilation 1, which results in inconsistency between training and testing and thus harms performance. Could you also try to train with dilation 1 to see the numbers?

Thanks for your advice. But I have re-trained model with dilation 1.

I see. Maybe dilation 1 is the reason for performance drop.

ppengtang commented 3 years ago

That's weird... Could you try to change padding and dilation to (1, 1) for conv5 to see what will happen?

Thanks for your advice. It works. Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.) In addition, My training time is about 6 hours, and testing time is about 1 hour. Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.

It seems to be the dilated convolution issue. Your model is trained with dilation 2 but tested with dilation 1, which results in inconsistency between training and testing and thus harms performance. Could you also try to train with dilation 1 to see the numbers?

Thanks for your advice. But I have re-trained model with dilation 1.

Btw, could you try to add

    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

after this line of codes for dilation=2?

U201714643 commented 3 years ago

Btw, could you try to add

    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

after this line of codes for dilation=2?

Thanks for your advice. But testing time is still 36 hours with dilation 2.

MRRRKING commented 3 years ago

I test the running time in this way: ` print(target_scale) print('*****') time4 = time() return_dict = model(**inputs) time5 = time() print('time5: ', time5 - time4)

cls prob (activations after softmax)

scores = return_dict['refine_score'][0].data.cpu().numpy().squeeze()

for i in range(1, cfg.REFINE_TIMES):
    scores += return_dict['refine_score'][i].data.cpu().numpy().squeeze()

scores /= cfg.REFINE_TIMES
# In case there is 1 proposal
scores = scores.reshape([-1, scores.shape[-1]])

time6 = time()
print('time6: ', time6 - time5)`

And the result is below: 480 time5: 1.396902084350586 time6: 0.05057382583618164 576 time5: 0.0050432682037353516 time6: 2.703824758529663 688 time5: 0.005361795425415039 time6: 4.799558162689209 864 time5: 0.005181312561035156 time6: 11.79275107383728 1200 time5: 0.0075037479400634766 time6: 27.75927186012268 From the test results, time is not spent on model prediction, but on data conversion. https://github.com/ppengtang/pcl.pytorch/blob/0896c82e2cef68c4c1a33078f863d98649ef30fe/lib/core/test.py#L109

I test the old codes which run on the pytroch 0.4.1 environment, and its testing time is normal. Is it related to the pytorch version?

Could you try to change these codes to the following codes?

    scores = return_dict['refine_score'][0].squeeze()
    for i in range(1, cfg.REFINE_TIMES):
        scores += return_dict['refine_score'][i].squeeze()
    scores /= cfg.REFINE_TIMES
    # In case there is 1 proposal
    scores = scores.view(-1, scores.size[-1]).data.cpu().numpy()

I don't think the issue is from pytorch version. On my GPUs, I could get correct results using pytorch 1.6.0

I replaced these codes, but it didn't work. The testing time is normal with dilation 1, but the mAP is lower too.

U201714643 commented 3 years ago

I replaced these codes, but it didn't work. The testing time is normal with dilation 1, but the mAP is lower too.

Are you using pytorch 1.7.0 or 1.7.1?

Glutton-zh commented 3 years ago

Hello, I meet the same problem in the test. I use gtx1080ti, pytorch 1.6.0 but I made the following changes in install.sh: ①pip --no-cache-dir install mmcv-full==latest+torch1.6.0+cu101 -f https://openmmlab.oss-accelerate.aliyuncs.com/mmcv/dist/index.html Change to pip install mmcv-full -f https://download.openmmlab.oss.com/mmcv/dist/cu101/torch1.6.0/index.html (according to the official) (because I always report mistakes in the original method) ②pip --no-cache-dir install numpy==1.16.0 Change to pip -- no cache dir install numpy==1.19.5 (for the reason when trying to solve the problem of mmcv, the environment always downloads 1.16.0 first, then automatically deletes it and uses 1.19.5.) After that, training for 13 hours is normal, and the test showed that it was expected to be 6 days. I don't know if these will lead to problems in the test.

ppengtang commented 3 years ago

Could you also try to add torch.backends.cudnn.enabled = False after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

MRRRKING commented 3 years ago

I replaced these codes, but it didn't work. The testing time is normal with dilation 1, but the mAP is lower too.

Are you using pytorch 1.7.0 or 1.7.1?

No, I use pytorch 1.6.0.

U201714643 commented 3 years ago

Could you also try to add torch.backends.cudnn.enabled = False after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

Thanks for your advice. It works. Now mAP is 51.7%, and CorLoc is 68.2%.(Model is trained with with dilation 2) In addition, testing time is about 80 minutes. Besides, VRAM usage ranges between 7500MB and 9500MB, which is more than testing with cudnn.

MRRRKING commented 3 years ago

Could you also try to add torch.backends.cudnn.enabled = False after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

It works. Thanks a lot. Now the testing time is about 2.5 hours, and mAP is 51.9.

Glutton-zh commented 3 years ago

Could you also try to add torch.backends.cudnn.enabled = False after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

Thanks for your help! It works. the testing time is normal about 2h24min, Mean AP = 0.5231

U201714643 commented 3 years ago

Could you also try to add torch.backends.cudnn.enabled = False after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

Thanks for your help! It works. the testing time is normal about 2h24min, Mean AP = 0.5231

Hi, Did you re-train your model with cudnn disabled?

ppengtang commented 3 years ago

Great! Thanks for helping to debug! It is unnecessary to re-train the model with cudnn disabled. Btw, you could try different random seeds (1~10) by changing cfg.RNG_SEED to reproduce the reported numbers.

Glutton-zh commented 3 years ago

Could you also try to add torch.backends.cudnn.enabled = False after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

Thanks for your help! It works. the testing time is normal about 2h24min, Mean AP = 0.5231

Hi, Did you re-train your model with cudnn disabled?

i just add "torch.backends.cudnn.enabled = False" in test_net, and all other codes are default

hpu-dxx commented 2 years ago

您是否还可以尝试在这行代码torch.backends.cudnn.enabled = False之后添加膨胀= 2?其他人在某些 GPU 卡上观察到类似的扩张卷积速度低的问题:https ://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

谢谢你的帮助! 有用。测试时间正常约2h24min, 平均AP = 0.5231

您是否还可以尝试在这行代码torch.backends.cudnn.enabled = False之后添加膨胀= 2?其他人在某些 GPU 卡上观察到类似的扩张卷积速度低的问题:https ://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412

谢谢你的帮助! 有用。测试时间正常约2h24min, 平均AP = 0.5231

您好!请问您知道怎么可视化检测结果吗?