Open VideoPac opened 4 years ago
@VideoPac a lot of things have changed in TF 2.x since we released the book, see what old version we were using back then: https://github.com/maxpumperla/deep_learning_and_the_game_of_go/blob/master/code/setup.py#L10
I think you should be good to go once rolling back to 1.13.x
. At some point @macfergus and I need to go back and revise everything for TF 2.x, and add proper testing etc. for such situations.
@maxpumperla thanks a lot for your answer and what a great book you wrote btw ! I am not a very experienced programmer and I have learnt more about DL reading through chap 7 than with any other book/tutorial before. And it's such a great fun to build a go bot :)
Back to my issues, I struggled for hours to conda/pip install the right versions of numpy, tensorflow and keras as shown in the setup.py but I always got some incompatibility issues. Finally, I just managed to pip install everything without incompatibilties using:
But still when I try to launch train_generator I get constant error messages:
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
and I'm not even able to stop the script by pressing ctrl-c in the command prompt.
I am now starting my third day trying to make this work... Please help, what did I do wrong? what should I try next? thanks
@VideoPac thanks, good to hear you like the book!
So this all boils down to multiprocessing issues in Windows, I'm afraid. As I do not have a Windows machine right now, it's difficult for me to help you directly. Can you point me to what script you're running exactly? This explanation should help you (it's pytorch, but the same root cause):
The script I'm trying to run is train_generator.py inside the code/examples folder.
But even if I just try to run the short script from listing 7.17 in the book :
from dlgo.data.parallel_processor import GoDataProcessor
processor = GoDataProcessor()
generator = processor.load_go_data('train', 100, use_generator=True)
print(generator.get_num_samples())
generator = generator.generate(batch_size=10)
X, y = generator.next()
besides the fact that the next() method doesn't seem to be defined anywhere (?), I get the same RuntimeError popping continuously like this:
.....
KGS-2004-19-12106-.tar.gz 12106
KGS-2003-19-7582-.tar.gz 7582
KGS-2002-19-3646-.tar.gz 3646
KGS-2001-19-2298-.tar.gz 2298
Using TensorFlow backend.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\spawn.py", line 115, in _main
prepare(preparation_data)
File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\spawn.py", line 226, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
run_name="__mp_main__")
File "C:\Users\John\Anaconda3\envs\kaa4\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Users\John\Anaconda3\envs\kaa4\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\John\Anaconda3\envs\kaa4\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\John\Desktop\xtest\deep_learning_and_the_game_of_go\code\dlgo\data\my_tests\generator_load.py", line 18, in <module>
generator = processor.load_go_data('train', 100, use_generator=True)
File "C:\Users\John\Anaconda3\envs\kaa4\lib\site-packages\dlgo-0.2-py3.5.egg\dlgo\data\parallel_processor.py", line 41, in load_go_data
index.download_files()
File "C:\Users\John\Anaconda3\envs\kaa4\lib\site-packages\dlgo-0.2-py3.5.egg\dlgo\data\index_processor.py", line 58, in download_files
pool = multiprocessing.Pool(processes=cores)
File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\context.py", line 118, in Pool
context=self.get_context())
File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\pool.py", line 174, in __init__
self._repopulate_pool()
File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
w.start()
File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\context.py", line 313, in _Popen
return Popen(process_obj)
File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\popen_spawn_win32.py", line 34, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\spawn.py", line 144, in get_preparation_data
_check_not_importing_main()
File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\spawn.py", line 137, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
>>> Reading cached index page
KGS-2019_04-19-1255-.tar.gz 1255
KGS-2019_03-19-1478-.tar.gz 1478
KGS-2019_02-19-1412-.tar.gz 1412
KGS-2019_01-19-2095-.tar.gz 2095.....
Right, but when I give you a link with a potential solution, why don't you try that at least? :D this is a multiprocessing
issue and the code (loading go data) uses that. So instead try:
from dlgo.data.parallel_processor import GoDataProcessor
def main():
processor = GoDataProcessor()
generator = processor.load_go_data('train', 100, use_generator=True)
print(generator.get_num_samples())
generator = generator.generate(batch_size=10)
X, y = generator.next()
if __name__ == "__main__":
main()
That would at least be good to confirm. p.s. next
comes with Python generators. https://stackoverflow.com/questions/1073396/is-generator-next-visible-in-python-3-0
My bad, I was at the same time on the solution you gave me of course, but couldn't yet figure out to which part of the code I should apply this... I thought that should go somewhere in parallel_processor and couldn't make it work... that was very obvious in fact. And ok, generator.next() should be changed to next(generator), I get it. Anyway thanks a lot for your help, I'm almost there I guess, I have now at least have the first epoch running in train_generator. I still get a :
H5pyDeprecationWarning:
The default file mode will change to 'r' (read-only) in h5py 3.0.
To suppress this warning, pass the mode you need to h5py.File(),
or set the global default h5.get_config().default_file_mode,
or set the environment variable H5PY_DEFAULT_READONLY=1.
and a
KeyError: 'Cannot set attribute. Group with name "keras_version" exists.
after the first epoch but I'll try to fix those by myself before asking for help :) Should I commit the changes to chap 7 branch once done?
@VideoPac no worries, happy to help.
yeah, if you could open a PR that'd be amazing! (we should make sure however, that the code works with both python 2 and 3 if possible)
your h5py message is just a warning, can be ignored. The other one is googleable https://github.com/keras-team/keras/issues/11276
Right, but when I give you a link with a potential solution, why don't you try that at least? :D this is a
multiprocessing
issue and the code (loading go data) uses that. So instead try:from dlgo.data.parallel_processor import GoDataProcessor def main(): processor = GoDataProcessor() generator = processor.load_go_data('train', 100, use_generator=True) print(generator.get_num_samples()) generator = generator.generate(batch_size=10) X, y = generator.next() if __name__ == "__main__": main()
That would at least be good to confirm. p.s.
next
comes with Python generators. https://stackoverflow.com/questions/1073396/is-generator-next-visible-in-python-3-0
Hi all!! I'm having the same (or similar issue). I'm using PyCharm with Python 3.8 in a Windows machine
If I use the solution quoted above:
import stuff needed
def main():
processor = GoDataProcessor()
generator = processor.load_go_data('train', 100, use_generator=True)
more code
more code
if __name__ == "__main__":
main()
And then run the code (green play button), the code runs well and I can train models. But I would like to run code in console, line by line, which is way better to learn.
If I try to run the code in console line by line, when I run
generator = processor.load_go_data('train', 1, use_generator=True)
I get an endless feed of repetitions of the following:
File "<string>", line 1, in <module>
File "C:\Users\LV4\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\LV4\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\Users\LV4\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\LV4\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\Users\LV4\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 264, in run_path
code, fname = _get_code_from_file(run_name, path_name)
File "C:\Users\LV4\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 234, in _get_code_from_file
with io.open_code(decoded_path) as f:
OSError: [Errno 22] Invalid argument: 'C:\\Data\\Mehlernas\\DatosCurrent\\DLGO - code\\<input>'
Note that I don't get this error:
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
I do get that error if I try to run the script (green play button) without the def main():
, like this:
import stuff needed
processor = GoDataProcessor()
generator = processor.load_go_data('train', 100, use_generator=True)
Does any of you know if there is a way to get the code working in the console line by line?
I guess that I could just use the def main():
method and use the debugger to see what happens line by line...
Thanks!!
I run into several issues:
1) when I run the code as in listing 7.17 I get an error because .next() is not defined, and indeed it's not.
2) If I skip the preceding issue and jump into train_ generator I get several errors:
I think I fixed this one by adding:
just before
pool = multiprocessing.Pool(processes=cores)
in parallel_processor, but I'm not sure that's the right way to proceed3) Again in train_generator I get a:
AttributeError: module 'tensorflow_core._api.v2.config' has no attribute 'experimental_list_devices'
which I attempted to fix by adding:before the code, but again even if it seems to work, I'm not sure this is the correct way to fix.
4) Last but not least in train_generator:
ValueError:
validation_steps=None
is only valid for a generator based on thekeras.utils.Sequence
class. Please specifyvalidation_steps
or use thekeras.utils.Sequence
class.This one I haven't yet figured how to solve.
Can you guys make the code work? Please help, thanks