Training is slow due to high Page-Fault rate

I see nothing was said about "slow" for more than a year now. I am using tensorflow 1.12.0 due to errors in other configurations. Reading this I realized i had to find whatever combination works of tensorflow (1.12.0)+cudnn (7)+cuda(9) (+python 3.5 etc.) on win 10.

When i tested on a very small amount of images = 2, the performance is super. When i added a few images - the performance is poor: I see people monitor the gpu, the cpu... looking for the bottleneck. Well here it is: The pyton process is swapping ~200K of memory pages between ram and the pagefile, per second, or per interval of task-manager/PF-Delta. My pagefile is on a ssd, but still, very very slow.

So, why is python doing so much PF? The process has around 1.6G of memory, so probably it needs more memory, and takes it from this virtual memory in the pagefile.

Do other versions of tensorflow solve this? Any other idea?

thtrieu / darkflow

Training is slow due to high Page-Fault rate #1163