Closed dofuuz closed 3 years ago
Same problem. Any update?
Is this the same as #5510? If so, I believe this is a windows error right now. What OS are you using?
@eliafrigieri are you using windows?
I'm using: windows 10, python 2.7.12, keras 1.0.8 with theano.
@eliafrigieri @DofuUZ I am starting to believe this is a windows issue. I get the same issue with tf and python 3.5. Good to know it manifests with python 2.7.
I can confirm the same issue. Windows 10, python 3.5, keras with theano.
Its been 6 years since I've mucked with python multiprocess and it was python 3.4. Im happy to contribute experience. Can anyone else offer assistance?
I've made my own solution, avoiding this issue. Basically rewriting the function that creates, populates and returns the queue and the stopping event (generator_queue into training.py)
@eliafrigieri Any chance you could post here? It would be much appreciated. I've been trying out a load of things over the past couple of days, to no avail...
It depends, what is your task?
I've got a large dataset and am trying to speed up training time on this task here: https://www.kaggle.com/c/data-science-bowl-2017. The windows issues with multiprocessing are proving quite painful. Edit: if what you've got is sensitive then no worries, you don't need to post it. I was only asking on the off chance that it was something inconsequential.
I have large dataset too, I've added some fuctions that load "batch-size" images and put into the queue. For example if you have 1000 images, you can split into 10 groups of 100 images each and launch 10 process for parallel loading; then only one process get from queue and call "train-on-batch". I'm not posting code, because it is too badly written and it is not defenitively version for my task (probaly I will change code every day in the next two week, only for parallel loading to increase the speed)
@eliafrigieri Very cool@ Are you able to share?
I was trying all sorts of options and I think I did something similar to you, @eliafrigieri . I was trying to create an external data_generator class that would use multiprocessing to populate a queue. The class was an iterator, so the queue would be accessed via next(). It's part of a larger code base, but I extracted an example here (attached data_generator_sample_main.txt data_generator_sample.txt
).
The problem I was having was that each multiprocessing pool imported keras, so I was getting all sorts of CNMEM warnings and everything looked like it was overflowing.
Does anyone have any insights on this?
I'm in the same situation. Every process I create import keras (I think because it's keras process creating the child), but once all the process are created and running the speed of loading increse a lot.
In case it helps, I was able to get the sample running without importing keras on child processes by including the import keras commands inside the get_model() function in data_generator_sample as this was the only place they were used. I'm not sure I'll be able to get it working like that in the full version of my project, but it may be an option for some people.
I thought the same solution, but it's not applicable in my solution so I don't tryed at all. The question is: this issue appears only on windows? It's a bug of keras?
It appears to be a bug in windows but it could also be a poor assumption made in keras wrt multiprocessing that manifests on windows.
My understanding of it is that if using multiprocessing with Windows you can't reference local variables from the point at which you call the process. You need to pass all variables explicitly via the args input. I think it should be possible to adapt the keras code by defining multiprocessing version of data_generator_task() outside the scope of the generator and passing a generator / stop / queue etc into it. That way it could work as a standalone function and can be spawned across multiple processes on any platform. Maybe best for windows users to try this option.
@ciararogerson This sounds reasonable and it looks like it is consistent with the example in #5510. Would you agree?
Today I've tryed the "multiprocessing.py" test on a Mac, with the same configuration of my: python 2.7.13, keras 1.0.8 and works fine. So the problem is only for windows, now we have the evidence
@eliafrigieri Great work!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
This is still an issue for python 3.5, windows 7.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Will anybody fix it eventually?
Looks like there is a PR created and merged for this. So can this be closed now ?
I have the same issue in predict_generator, however fit_generator is working find with multiprocessing
I'm trying to use fit_generator to seperate data loader from trainer.
Excuting this code produces error like below:
I also tried to test keras/tests/keras/test_multiprocessing.py, but it failed.
Here is output for test_multiprocessing.py: test_multiprocessing.faillog.txt
Is it bug of Keras itself? Any fixes available?