tensorflow / models

Models and examples built with TensorFlow
Other
77.21k stars 45.75k forks source link

Frozen pretrained Faster RCNN/RFCN networks from model zoo yielding different outputs on different GPUs and runs #2374

Closed EpochalEngineer closed 4 years ago

EpochalEngineer commented 7 years ago

System information

Describe the problem

Running on different GPUs yields different results, and GPUs 1 and 2 are not deterministic. This is accomplished by making devices 1,2 invisible, and tensorflow runs on 0, and so forth. This is using frozen pretrained networks from this repository's linked model zoo and the supplied object_detection_tutorial.ipynb with no modifications other than setting the cuda visible_device_list. The SSD frozen models, however, give identical outputs on the 3 GPUs from what I have seen.

I have also run cuda_memtest on all 3 GPUs, logs attached

UPDATE: I just tested on a second machine with 2 GPUs, and reproduced the issue. GPU 0 is deterministic, GPU 1 is not (and often produces bad results).

Source code / logs

I've attached the diff of the modified object_detection_tutorial.ipynb which loops over 3 GPUs 3 times and prints out the top box scores, which change depending on the run. Also attached is a PDF of that ipynb with detections drawn on it. Text output:

Evaluating image 0

Running on GPU 0 Top 4 box scores: Iter 1: [ 0.99978215 0.99857557 0.95300484 0.91580492] Iter 2: [ 0.99978215 0.99857557 0.95300484 0.91580492] Iter 3: [ 0.99978215 0.99857557 0.95300484 0.91580492]

Running on GPU 1 Top 4 box scores: Iter 1: [ 0.68702352 0.16781448 0.13143283 0.12993629] Iter 2: [ 0.18502565 0.16854601 0.08074528 0.07859289] Iter 3: [ 0.18502565 0.16854601 0.05546702 0.05111229]

Running on GPU 2 Top 4 box scores: Iter 1: [ 0.68702352 0.16781448 0.13143283 0.12993629] Iter 2: [ 0.18941374 0.18502565 0.16854601 0.16230994] Iter 3: [ 0.18502565 0.16854601 0.05546702 0.05482833]

Evaluating image 1

Running on GPU 0 Top 4 box scores: Iter 1: [ 0.99755412 0.99750346 0.99380219 0.99067008] Iter 2: [ 0.99755412 0.99750346 0.99380219 0.99067008] Iter 3: [ 0.99755412 0.99750346 0.99380219 0.99067008]

Running on GPU 1 Top 4 box scores: Iter 1: [ 0.96881998 0.96441168 0.96164131 0.96006596] Iter 2: [ 0.9377929 0.91686022 0.80374646 0.79758978] Iter 3: [ 0.90396696 0.89217037 0.85456908 0.85334581]

Running on GPU 2 Top 4 box scores: Iter 1: [ 0.9377929 0.91686022 0.80374646 0.79758978] Iter 2: [ 0.9377929 0.91686022 0.80374646 0.79758978] Iter 3: [ 0.9377929 0.91686022 0.80374646 0.79758978]

object_detection_tutorial.diff.txt

gpu_output_differences.pdf

Updated with longer run: cuda_memtest.log.txt

EpochalEngineer commented 7 years ago

Updated with a simplified test with model_zoo and second machine test that reproduced these issues.

EpochalEngineer commented 7 years ago

@aselle Was there supposed to be a response added with the removal of that tag?

aselle commented 7 years ago

@nealwu, could you take a look?

nealwu commented 7 years ago

Looks like this is an object detection question. Looping in @derekjchow @jch1

EpochalEngineer commented 7 years ago

Noticed a difference in using an environment variable CUDA_VISIBLE_DEVICES vs setting the config parameter. We're no longer able to reproduce this behavior with the environment variable, only with the config parameter. In addition, when using the config parameter, there is a small ~180 MB task on GPU0 when the config file is set to use GPU[1,2], which seems to correlate with these issues.

tensorflowbutler commented 4 years ago

Hi There, We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.