Closed avinashcpandey closed 4 years ago
Tensorflow built with DNNL does not really change its behavior based on OMP_NUM_THREADS
, at least in the recent versions:
Here's code from threadpool_device.cc:45 contributed in 2018.
ThreadPoolDevice::ThreadPoolDevice(const SessionOptions& options,
const string& name, Bytes memory_limit,
const DeviceLocality& locality,
Allocator* allocator)
: LocalDevice(options, Device::BuildDeviceAttributes(
name, DEVICE_CPU, memory_limit, locality)),
allocator_(allocator),
scoped_allocator_mgr_(new ScopedAllocatorMgr(name)) {
#ifdef INTEL_MKL
#ifdef _OPENMP
const char* user_omp_threads = getenv("OMP_NUM_THREADS");
if (user_omp_threads == nullptr) {
// OMP_NUM_THREADS controls MKL's intra-op parallelization
// Default to available physical cores
const int mkl_intra_op = port::NumSchedulableCPUs();
const int ht = port::NumHyperthreadsPerCore();
omp_set_num_threads((mkl_intra_op + ht - 1) / ht);
} else {
uint64 user_val = 0;
if (strings::safe_strtou64(user_omp_threads, &user_val)) {
// Superflous but triggers OpenMP loading
omp_set_num_threads(user_val);
}
}
#endif // _OPENMP
#endif // INTEL_MKL
}
Some Python scripts do set intra/inter-op values based on OMP_NUM_THREADS
, but this is easily fixable and has nothing to do with DNNL.
Thanks for the prompt reply. Other thing I am looking for is if I am running Tensorflow with single thread(using this OMP_NUM_THREADS =1) then to run DNNL multithreaded what I need to do? I want to override OMP_NUM_THREADS behaviour for DNNL though something....like we use MKL_NUM_THREADS for BLAS routine with mkl library.
@avinashcpandey, I'm not sure I understand the question. Tensorflow built with DNNL has a few knobs that manage threading behavior:
So I would say for the experiment you want to run you need something like
inter_op_parallelism_threads=1, intra_op_parallelism_threads=1 OMP_NUM_THREADS=N
inter_op_parallelism_threads=1, intra_op_parallelism_threads=N OMP_NUM_THREADS=1
DNNL does not provide a way to control number of threads it uses besides the mechanisms provided by threading runtime.
Thanks Vadim! I got your point.
I am running some Tensorflow experiments with and without DNNL using OMP_NUM_THREADS=1 and more. If I set OMP_NUM_THREADS it will have effect on Tensorflow as well as DNNL...both have their own parallel implementation.
I want to run Tensorflow with single thread(OMP_NUM_THREADS =1) and DNNL multithreaded. How do I do that? I want to see how DNNL function performs when Tensorflow is single thread and DNNL is multithreaded
When I set OMP_NUM_THREADS it will effect both. Similar to MKL_NUM_THREADS do we have anything in DNNL to control this?
Environment
Build TF with DNNL Exporting OMP_NUM_THREADS
Actual behavior
Doing good
Expected behavior
Doing good