intel / ai-reference-models

Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Intel® Data Center GPUs
Apache License 2.0
676 stars 219 forks source link

Run more docker containters with Inter-optimized-tensorflow on One 8 physical core 16cores Cpu #87

Open siwang2011 opened 3 years ago

siwang2011 commented 3 years ago

hello, I find the inter-optimized-tensorflow has the great increasing on train phase. but i want to run 3 docker containters in 8 physical core 16cores Cpu, i set every containter with 4 logical core how i set the param intra_/inter_op_parallelism_threads and OMP_NUMTHREADS? when one containter runs, the train time cost 17s every epoch, but when i run 3 containters, in every containter the train time cost 50s/epoch. by the way i set intra/inter_op_parallelism_threads =2, OMP_NUM_THREADS= 2 ,KMP_BLOCKTIME=1 in containter. please tell me why?

venky-intel commented 3 years ago

Hi @siwang2011, so when you are using the docker container from Intel you can just provide the arguments intra_op_parallelism, inter_op_parallelism and OMP_NUM_THREADS as docker run environment variable args. It would look something like: docker run <intel-image-name> -e intra_op_parallelism=2 -e inter_op_parallelism=2 -e OMP_NUM_THREADS=2 -e KMP_BLOCKTIME=1

Another thing to note here is, although you can assign specific cpu's for a container and run parallel containers at the same time it doesn't necessarily guarantee a performance improvement. Because, the resources are being constrained and the training might take longer than when it's using the entire system with shared cores.

The values you have set for intra_op/inter_op, OMP_NUM_THREADS etc. are based on the machine we have initially tested and those were the best known configurations that worked. However, as your hardware specs change these numbers should also change.

Check this link for more information about what each of those env variables mean and what values they need to be set to as per your hardware configuration: https://github.com/IntelAI/models/blob/master/docs/general/tensorflow/GeneralBestPractices.md

siwang2011 commented 3 years ago

@venky-intel Hi! thanks for your reply i want to know what is the constrained resources in my condition And is there any way to solve this problem? My English is poor, sorry

sramakintel commented 6 months ago

@siwang2011: do you still need assistance with the issue you were facing?