Closed erwinkendo closed 3 years ago
This issue is a bit unclear to me about whether that's expected or whether it should only occur with a standard data generator, but not with a sequence. I also get
ValueError: Using a generator with use_multiprocessing=True is not supported on Windows (no marshalling of generators across process boundaries). Instead, use single thread/process or multithreading.
with both (generator or sequence) when using multiprocessing=True in keras 2.2.0 in Python 3.6.6 under Windows 10 64-bit. I am not sure how to do multithreading in keras instead, either (the error message seems to suggest that might solve the problem, but I was unable to find any information exactly how one would do that).
We do mimic Windows in the unit test using the 'spawn' method. So it "should" work. I don't own a Windows machine so I can't really help you.
@Dref360 Can you point me to some code I should try, which would be expected to work? Does one need to do anything specific to use the 'spawn' method?
Try running the tests in tests/keras/utils/data_utils.py
'spawn' (fork-exec) is the default on Windows. UNIX uses fork by default. In those tests, we try with both mechanisms.
I will test it from home with my proper system. I just realized that with a sequence generator at least multi_processing=False and workers=4 (or so) does actually do multi-threading. I had missed that, because my toy example did not spend enough time in the sequence generator to make it really obvious. multi_processing=True of course still hangs the system.
@Dref360 Sorry, finally got around to trying it. What file / filename do I need to provide in the data_utils.py code? I assume some filename needs to go into where it says __file__
?
if __name__ == '__main__': pytest.main([__file__])
Or am I missing what the standard way of running these tests is?
Any suggestions from anyone how to test this on a windows system (I'm honestly not clear on what files are needed for the test script @Dref360 pointed to)?
pytest tests/keras/test_multiprocessing.py tests/keras/utils/
Follow the CONTRIBUTION.md for your setup.
This still seems to be an issue. When use_multithreading=True, it is just hanging and literally nothing is happening. I am running it on Windows 10.
Setting workers to a number that is bigger than 1 seems to improve the speed even if use_multithereading=False. Why is this setup improving it?
Additionally, I also wonder if one needs to make its class generator (with Sequence
) thread safe? Since I am not able to set use_multithreading to True, I am then wondering if I need to make my generator thread safe? In that case, I am also wondering if the thread safe version will give the desired performance improvement..?
I even asked a question related to this topic (regarding how things should be working in wondows 10) on Stackoverflow. https://stackoverflow.com/questions/52932406/is-the-class-generator-inheriting-sequence-thread-safe-in-keras-tensorflow But noone has replied so far...
@Dref360 Couldn't the keras team update keras/utils/data_utils.py
to pass a regular multiprocessing.Lock()
at Pool
creation time, using the initializer
kwarg? This will make your lock instance global in all the child workers. See this Stackoverflow answer: Python sharing a lock between processes.
What would be the purpose of this Lock?
@Dref360 To overcome the error on Windows TypeError: can't pickle _thread.lock objects
. (To be clear, that is the error I receive on Windows. With a global Lock, Windows users might at least be able to use OrderedEnqueuer
through a generator derived from the Sequence
class to utilize multiprocessing.)
@Dref360 I create a generator class that extends tensorflow.python.keras.preprocessing.image.Iterator
. I only implement __init__
and _get_batches_of_transformed_samples
. The problem is Iterator
itself contains a threading.Lock()
def __init__(self, n, batch_size, shuffle, seed):
....
self.lock = threading.Lock()
...
and uses it in its next()
function to control index generation
def next(self):
"""For python 2.x.
# Returns
The next batch.
"""
with self.lock:
index_array = next(self.index_generator)
# The transformation of images is not under thread lock
# so it can be done in parallel
return self._get_batches_of_transformed_samples(index_array)
When I try to use my generator and pass it to fit_generator
, I inevitably get the error TypeError: can't pickle _thread.lock objects
. Thread locks can be marshalled on Linux, but not Windows.
My initial thought was to have Sequence
hold a self.lock
, then update init_pool
and init_pool_generator
in data_utils.py
to accept a lock, changing the first two lines to be
global _SHARED_SEQUENCES, _LOCK
_SHARED_SEQUENCES = seqs
_LOCK = lk
and lastly update the _get_executor_init
functions in SequenceEnqueuer
subclasses to add lk
to initargs
return lambda seqs: mp.Pool(workers,
initializer=init_pool,
initargs=(seqs,lk))
Then, per the above mentioned StackOverflow answer, Iterator would inherit its lock from Sequence, but I think we would run into the same problem because seqs
gets passed to initargs, which would then contain locks.
Also, fyi, a separate class that just holds a threading lock doesn't work either (i.e. Have Iterator
extend (Sequence, SomeLockContainerClass)
).
@Dref360 Dumb question, but is there any way the thread locking could be moved to a function outside the Iterator
class in and have next
call it, similar to what was done with init_pool
and seqs
in data_utils.py
? Or maybe just making turning _flow_index
into a Queue of indeces to be accessed across processes?
Same here..! any solutions?
PRs are welcome. I cannot work on this issue as I do not use Windows.
I think the threading lock in the multiprocess module is unused. Can I remove the threading lock in the class Sequence?
Sorry, but I don't see which Lock you're talking about.
In the class Sequence there is no lock: https://github.com/keras-team/keras/blob/master/keras/utils/data_utils.py#L305.
We do have a Lock hold by the _SEQUENCE_COUNTER: https://github.com/keras-team/keras/blob/master/keras/utils/data_utils.py#L450
and internally there is a Lock inside the Queue. Both of these are really important.
COuld you point me to the code you're refering to?
De : txyugood notifications@github.com Envoyé : 6 août 2019 10:54:15 À : keras-team/keras keras@noreply.github.com Cc : Frédéric Branchaud-Charron Frederic.Branchaud-Charron@USherbrooke.ca; Mention mention@noreply.github.com Objet : Re: [keras-team/keras] fit_generator using use_multiprocessing=True does not work on Windows 8.1 x64, python 3.5 (#10842)
I think the threading lock in the multiprocess module is unused. Can I remove the threading lock in the class Sequence?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/keras-team/keras/issues/10842?email_source=notifications&email_token=ACEPRIS7ULGFBDNA344I5L3QDGGBPA5CNFSM4FNYXHVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3VM3KQ#issuecomment-518704554, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACEPRIRZZOPZTCAMM2OLTCDQDGGBPANCNFSM4FNYXHVA.
I'm sorry, the threading lock in the class Iterator. https://github.com/keras-team/keras-preprocessing/blob/master/keras_preprocessing/image/iterator.py#L43.
I mean the multiprocess module doesn't use the thread, it uses the process. So is the threading lock necessary in the process ? Can I remove it, when I used the multiprocess on windows?
I guess this is used when we do not use the OrderedEnqueuer, because we cannot iterate a generator by two deferent threads. If it really solves your problem, we can make the lock lazily in next.
De : txyugood notifications@github.com Envoyé : 6 août 2019 11:42:23 À : keras-team/keras keras@noreply.github.com Cc : Frédéric Branchaud-Charron Frederic.Branchaud-Charron@USherbrooke.ca; Mention mention@noreply.github.com Objet : Re: [keras-team/keras] fit_generator using use_multiprocessing=True does not work on Windows 8.1 x64, python 3.5 (#10842)
I'm sorry, the threading lock in the class Iterator. https://github.com/keras-team/keras-preprocessing/blob/master/keras_preprocessing/image/iterator.py#L43.
I mean the multiprocess module doesn't use the thread, it uses the process. So is the threading lock necessary in the process ? Can I remove it, when I used the multiprocess on windows?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/keras-team/keras/issues/10842?email_source=notifications&email_token=ACEPRIR46VDJK56OMAZS6LDQDGLV7A5CNFSM4FNYXHVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3VR7NQ#issuecomment-518725558, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACEPRIQWPVA6YEY6YIAIC33QDGLV7ANCNFSM4FNYXHVA.
Still no solution?
@evrial @txyugood @mchaniotakis I dont think there is a solution, unfortunately. Tensorflow has historically been built for Linux OSs. The multiprocesssing
module is the best bet to get this working with Windows, but IMO it would take a complete rewrite of preprocessing.image
.
I have a proposed "solution" that may interest others. Please note this is not a direct solution to the problem, but I believe a useful workaround. Please note this is coming from my experience with Tensorflow 1.15 (I have yet to use version 2). Please also see StackOverflow question Is the class generator (inheriting Sequence) thread safe in Keras/Tensorflow?
_Install wsl
version 2 on Windows, install Tensorflow in a Linux environment (e.g. Ubuntu) here, and then set use_multiprocessing
to True
to get this to work._
NOTE: The Windows Subshell for Linux (WSL) version 2 is only available in Windows 10, Version 1903, Build 18362 or higher. Be sure to upgrade your Windows version in Windows Update to get this to work.
For multitasking
and multithreading
(i.e. parallelism
and concurrency
), there are two operations we must consider:
forking
= a parent process creates a copy of itself (a child) that has an exact copy of all the memory segments it usesspawning
= a parent process creates an entirely new child process that does not share its memory and the parent process must wait for the child process to finish before continuingLinux supports forking
, but Windows does not. Windows only supports spawning
.
The reason Windows hangs when using use_multiprocessing=True
is because the Python threading
module uses spawn
for Windows. Hence, the parent process waits forever for the child to finish because the parent cannot transfer its memory to the child, so the child doesn't know what to do.
On Windows, use_multiprocessing=True
It is not threadsafe
. On Windows, if you've ever attempted to use a data generator or sequence, you've probably seen an error like this
ValueError: Using a generator with use_multiprocessing=True is not supported on Windows
(no marshalling of generators across process boundaries). Instead, use single
thread/process or multithreading.
marshalling
means "transforming the memory representation of an object into a data format that is suitable for transmission." The error is saying that unlike Linux, which uses fork
, use_multiprocessing=True
doesn't work on Windows because it uses spawn
and cannot transfer its data to the child thread.
At this point, you may be asking yourself:
"Wait...What about the Python Global Interpreter Lock (GIL)?..If Python only allows one thread to run at a time, why does it even have the threading
module and why do we care about this in Tensorflow??!"
The answer lies in the difference between CPU-bound tasks
and I/O-bound tasks
:
CPU-bound tasks
= those that are waiting for data to be crunchedI/O-bound tasks
= those that are waiting for input or output from other processes (i.e. data transferring)In programming, when we say two tasks are concurrent
, we mean they can start, run, and complete in overlapping time. When we say they are parallel
, we mean they are literally running at the same time.
So, the GIL prevents threads from running in parallel, but not concurrently. The reason this is important for Tensorflow is because concurrency is all about I/O operations (data transfer). A good dataflow pipeline in Tensorflow should try to be concurrent
so that there's no lag time when data is being transferred to-and-from the CPU, GPU, and/or RAM and training finishes faster. (Rather than have a thread sit and wait until it gets data back from somewhere else, we can have it executing image preprocessing or something else until the data gets back.)
GIL
was made in Python because everything in Python is an object. (This is why you can do "weird" things with "dunder/magic" methods, like (5).__add__(3)
to get 8
5
since 5.
is a float
, so we need to take advantage of order of operations by using parentheses.race condition
and objects would be deleted "randomly". We could put a lock
on each thread, but then we would be unable to prevent deadlocks
.parallel
thread execution was seen by Guido (and by myself, though it is certainly arguable) as a minor loss because we still maintained I/O concurrent operations, and tasks could still be run in parallel
by running them on different cpu cores (i.e. multiprocessing
). Hence, this is (one reason) why Python has both the threading
and multiprocessing
modules.Now back to threadsafe
. When running concurrent
/parallel
tasks, you have to watch out for additional things. Two big ones are:
race conditions
- operations don't take exactly the same time to execute each time the program is run (hence, e.g., why we typically average results over a number of runs when using the timeit
module). Because threads will finish at different times depending on the execution run, you will get different results with each run that are unpredictable a priori.
deadlock
- if two threads try to access the same memory at the same time, you'll get an error. To prevent this, we add a lock
or mutex
(mutual exclusion) to threads to prevent other threads from accessing the same memory while it is running. However, if two threads are locked, need to access the same memory, and each depends on the other to finish in order to execute, the program hangs in what is known as a deadlock
.
I bring this up because Tensorflow needs to be able to pickle
Python objects to make code run faster. (pickling
is turning objects and data into machine byte code, much in the same way that an program's source code is converted into an exe
on Windows). The Tensorflow Iterator.__init__()
method locks threads and contains a threading.Lock()
def __init__(self, n, batch_size, shuffle, seed):
...
self.lock = threading.Lock()
...
The problem is Python cannot pickle
threading lock objects on Windows (i.e. Windows cannot marshall
thread locks to a child
thread).
If you try to use a generator and pass it to fit_generator
, you will get the error
TypeError: can't pickle _thread.lock objects
So, while use_multiprocessing=True
is threadsafe on Linux, it is not on Windows.
Solution: Around June 2020, Microsoft came out with version 2 of the Windows Subshell for Linux (wsl
). This was significant because it enabled GPU hardware acceleration. Version 1 was "simply" a driver between Windows NT and Linux, whereas wsl
is now actually a kernel. Thus, you can now install Linux on Windows, open a bash shell from the command prompt, and (most importantly) access hardware. Thus, it is now possible to install tensorflow-gpu
on wsl
. In addition, you'll now be able to use fork
.
**Thus, I recommend
wsl
version 2 on Windows and add your desired Linux environmenttensorflow-gpu
in a virtual environment in wsl
Linux environment hereuse_multiprocessing=True
to see if it works.**CAVEAT: I haven't tested this yet to verify that it works, but to the best of my limited knowledge, I believe it should.
I am still able to reproduce the issue on Python 3.9.0 and Tensorflow 2.6.0 on Windows 10.
I tried WSL 2 but the speedup relative to Windows 10 without multiprocessing was of just 20%.
Is there any alternative or solution today?
I am trying to get this same option to work on MacOS Monterey. While other python packages are able to use multi-processing, I see no improvement at all with this option. I am running 8 cores, tensorflow 2.6. I am using Keras Sequential models. I have:
workers=6,
use_multiprocessing=True
And there is zero difference between having this on or off.
Dear Keras community
I have been using keras succesfully for many tasks.
After implementing a custom data generator using the keras
Sequence
class, I tried using theuse_multiprocessing=True
of thefit_generator
function, with more than 1 worker (so data can be fed to my GPU).Unfortunately, after testing this setup in 3 different machines, the code seems to work only on Linux (even having a different GPU).
Is this the expected behaviour on a windows machine?
Kind regards,