Closed ghost closed 4 years ago
This could be difficult. Multi-threading is introduced in several places and they may not all be easily limited. I'm not aware that the development team has considered running in a thread starved environment, except for the mobile runtime which is probably not useful to you.
The TF runtime executable is mostly a C++ runtime, but most users users call it via a python wrapper for graph construction. Since you mention Slim, I assume that's what you're doing.
The python interpreter is going to require some minimal number of threads, and then any python-coded TF functionality you use could explicitly introduce more threads. For example, by preprocessing input in parallel. It looks like the slim --num_preprocessing_threads flag controls such a feature. Adding @sguada for any more advice on thread-limiting slim.
Threads introduced by the C++ runtime are completely separate from any used by python. The C++ runtime can have several thread pools, principally one or more for CPU kernel execution and another for potentially blocking I/O related functions. The former pools are of fixed size and can be configured by options here. Unfortunately I don't think the latter is limitable in a distributed configuration. Also in the distributed configuration you'll probably end up with yet another threadpool of unknown size servicing the RPC system. Since you're trying to run on a supercomputer, I'd guess that's happening at every node.
@poxvoculi Thank you so much for your explanations. Actually, I don't know what should I do to fix this problem?
You should try to figure out how many threads are spawned by which parts of your program. From the diagram above it looks like something executing in python is spawning 128 threads. Track that down in more detail. Unfortunately I'm not adept at python and can't give much help on how do to that. Maybe searching the source code or via some kind of debugger.
Hi guys. Thank for the information above. I have the same problems. I tried to set one thread to SessionOptions and RunOptions but the inference program still spawns 10 threads. For my task, thread limitation is a must :(. Is there any place you guys could think of? Thanks again.
Hi There, We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.
Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!
How do you confine TensorFlow to a single core and single thread?
Do we have any options to control the number of threads in TF-Slim both in training and evaluation processes?
Specifically, I use this network for my classification problem. I changed the evaluation part in a way that runs train and evaluation in parallel like this code. I can run it on my own CPU without any problem. But I can't execute them on a supercomputer. It seems that it is related to the very large number of threads which are being created by Tensorflow. If the number of threads exceeds the maximum number of threads pre-set in SLURM (= 28) then the job will fail. Since it's unable to create new threads it will end up with error "resource temporarily unavailable".
This error provided when the code tries to restore parameters from checkpoints. If there is no limitation on the number of threads (like on my pc) it works fine:
However, when there is a limitation on the number of threads (like SLURM job submission on supercomputers) we get:
I tried to limit the number of CPU threads used by Tensorflow to 1 by creating config like:
But unfortunately, that didn't help. In my opinion, the main problem we are having here is the fact that we are not able to control the number of threads here. Although we set it to 1 with various TF options you can actually see that this job is creating many more threads on the node:
Training script is creating 128 threads and evaluation script is creating 8 (both numbers vary over time).
P.S. I'm using Python 2.7.13 and Tensorflow 1.3.0.