Open oak-tree opened 4 years ago
self.total_batches_seen is process-independent isn't? If process 1 changes the value, it wouldn't affect process 2.
When using threads, I agree that a race-condition will/may occurs, but it is rare that users use threads.
if self.index_array is None
this statement is should never true when used in fit_generator.
@Dref360 Hey, thanks for answering.
What would happen if seed
is not none? then every worker will get different kind of seed. Moreover, the total_batches_seen
is meaningless, isnt? as every async
run of get_index
will see the same version of total_batches_seen
i.e this will never get above 1.
I agree, my mistake, the workers should never get the chance to update/set index_array
because of the fact that index_array
is never None.
so I think that my question changed to :why is it necessary to update
the np.seed
from get_index
?
What would happen if seed is not none? then every worker will get different kind of seed.
Yeah we probably should use a multiprocess.Value there instead.
version of total_batches_seen i.e this will never get above 1.
Not really. In keras.utils.OrderedEnqueuer we spawn workers
processes using a pool.
Each worker has a queue of task and the pool resets every epoch.
So for each worker, total_batches_seen
would be the number of tasks done by this worker for this epoch.
Hope this helps!
@Dref360
Not really. In keras.utils.OrderedEnqueuer we spawn workers processes using a pool. Each worker has a queue of task and the pool resets every epoch. So for each worker, total_batches_seen would be the number of tasks done by this worker for this epoch.
Are you sure that the attributes are being changed? Because of the keras.utils.OrderedEnqueuer
uses async
from pool
I think that any change on the instance attributes is not being kept for the next iteration. i.e., every async
call is transferred to the state
of the iterator instance
to the worker in async manners. So changes on this call do not affect the original instance.
Sorry I wasn't clear enough. The original instance doesn't change, but the instance per worker will. As I said, probably not the right behavior.
am... important to note that each worker get new copy of the instance every async call. so it wont save the changed attributes
Background
Keras itrator
implements__get_index__
as following : https://github.com/keras-team/keras-preprocessing/blob/master/keras_preprocessing/image/iterator.py#L53`
This method is being used for
async access
while usingfit_generator
withmulti process workers
. Withasync
accessself.index_array
andself.total_batches_seen
will be seen differently for each worker.Regular use as
generator
Note that this issue does not happen as the
next
implementation does provide a lock mechanism for multi-thread access for calculating the next batch indices.