Inference mode with IMAGES_PER_GPU > 1 gets stuck on prediction

For some weird reason, if I set the amount of images per gpu > 1 (which in my case translates to batch size, as I have just 1 videocard), model gets stuck on predict call.

I would greatly appreciate any help. All my experiments with different workarounds, including, but not limited to forcing _make_predict_function, finalizing graph, etc. wouldn't help.

Detection would just consume terrible amounts of RAM (30% of p2.xlarge, which is a lot). nvidia-smi wouldn't show any videocard utilization at all on the other hand.

Here are the specs I use: K80 - 12GB RAM (aws p2.xlarge)

CUDA versions: cuda-command-line-tools-9-0 cuda-cublas-9-0 cuda-cufft-9-0 cuda-curand-9-0 cuda-cusolver-9-0 cuda-cusparse-9-0 libcudnn7=7.0.4.31-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.0 libnvinfer4=4.1.2-1+cuda9.0

pip versions: tensorflow==1.5.0 tensorflow-gpu==1.5.0

Any help would be highly appreciated. The issue is reproducible on any of the python notebooks with tweaks for batch_size increase.

matterport / Mask_RCNN

Inference mode with IMAGES_PER_GPU > 1 gets stuck on prediction #1491