maxpumperla / deep_learning_and_the_game_of_go

Code and other material for the book "Deep Learning and the Game of Go"
https://www.manning.com/books/deep-learning-and-the-game-of-go
953 stars 387 forks source link

Chapter7 many issues #63

Open VideoPac opened 4 years ago

VideoPac commented 4 years ago

I run into several issues:

1) when I run the code as in listing 7.17 I get an error because .next() is not defined, and indeed it's not.

2) If I skip the preceding issue and jump into train_ generator I get several errors:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

I think I fixed this one by adding:

  if __name__ == '__main__':
                freeze_support()

just before pool = multiprocessing.Pool(processes=cores) in parallel_processor, but I'm not sure that's the right way to proceed

3) Again in train_generator I get a: AttributeError: module 'tensorflow_core._api.v2.config' has no attribute 'experimental_list_devices' which I attempted to fix by adding:

def _get_available_gpus():
    """Get a list of available gpu devices (formatted as strings).

    # Returns
        A list of available GPU devices.
    """
    #global _LOCAL_DEVICES
    if tfback._LOCAL_DEVICES is None:
        devices = tf.config.list_logical_devices()
        tfback._LOCAL_DEVICES = [x.name for x in devices]
    return [x for x in tfback._LOCAL_DEVICES if 'device:gpu' in x.lower()]

tfback._get_available_gpus = _get_available_gpus

before the code, but again even if it seems to work, I'm not sure this is the correct way to fix.

4) Last but not least in train_generator:

ValueError: validation_steps=None is only valid for a generator based on the keras.utils.Sequence class. Please specify validation_steps or use the keras.utils.Sequence class.

This one I haven't yet figured how to solve.

Can you guys make the code work? Please help, thanks

maxpumperla commented 4 years ago

@VideoPac a lot of things have changed in TF 2.x since we released the book, see what old version we were using back then: https://github.com/maxpumperla/deep_learning_and_the_game_of_go/blob/master/code/setup.py#L10

I think you should be good to go once rolling back to 1.13.x. At some point @macfergus and I need to go back and revise everything for TF 2.x, and add proper testing etc. for such situations.

VideoPac commented 4 years ago

@maxpumperla thanks a lot for your answer and what a great book you wrote btw ! I am not a very experienced programmer and I have learnt more about DL reading through chap 7 than with any other book/tutorial before. And it's such a great fun to build a go bot :)

Back to my issues, I struggled for hours to conda/pip install the right versions of numpy, tensorflow and keras as shown in the setup.py but I always got some incompatibility issues. Finally, I just managed to pip install everything without incompatibilties using:

But still when I try to launch train_generator I get constant error messages:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

and I'm not even able to stop the script by pressing ctrl-c in the command prompt.

I am now starting my third day trying to make this work... Please help, what did I do wrong? what should I try next? thanks

maxpumperla commented 4 years ago

@VideoPac thanks, good to hear you like the book!

So this all boils down to multiprocessing issues in Windows, I'm afraid. As I do not have a Windows machine right now, it's difficult for me to help you directly. Can you point me to what script you're running exactly? This explanation should help you (it's pytorch, but the same root cause):

https://pytorch.org/docs/stable/notes/windows.html#multiprocessing-error-without-if-clause-protection

VideoPac commented 4 years ago

The script I'm trying to run is train_generator.py inside the code/examples folder.

But even if I just try to run the short script from listing 7.17 in the book :

from dlgo.data.parallel_processor import GoDataProcessor

processor = GoDataProcessor()
generator = processor.load_go_data('train', 100, use_generator=True)
print(generator.get_num_samples())
generator = generator.generate(batch_size=10)
X, y = generator.next()

besides the fact that the next() method doesn't seem to be defined anywhere (?), I get the same RuntimeError popping continuously like this:

.....
KGS-2004-19-12106-.tar.gz 12106
KGS-2003-19-7582-.tar.gz 7582
KGS-2002-19-3646-.tar.gz 3646
KGS-2001-19-2298-.tar.gz 2298
Using TensorFlow backend.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\spawn.py", line 106, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\spawn.py", line 115, in _main
    prepare(preparation_data)
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\spawn.py", line 226, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\John\Desktop\xtest\deep_learning_and_the_game_of_go\code\dlgo\data\my_tests\generator_load.py", line 18, in <module>
    generator = processor.load_go_data('train', 100, use_generator=True)
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\site-packages\dlgo-0.2-py3.5.egg\dlgo\data\parallel_processor.py", line 41, in load_go_data
    index.download_files()
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\site-packages\dlgo-0.2-py3.5.egg\dlgo\data\index_processor.py", line 58, in download_files
    pool = multiprocessing.Pool(processes=cores)
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\context.py", line 118, in Pool
    context=self.get_context())
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\pool.py", line 174, in __init__
    self._repopulate_pool()
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
    w.start()
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\context.py", line 313, in _Popen
    return Popen(process_obj)
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\popen_spawn_win32.py", line 34, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\spawn.py", line 144, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\John\Anaconda3\envs\kaa4\lib\multiprocessing\spawn.py", line 137, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
>>> Reading cached index page
KGS-2019_04-19-1255-.tar.gz 1255
KGS-2019_03-19-1478-.tar.gz 1478
KGS-2019_02-19-1412-.tar.gz 1412
KGS-2019_01-19-2095-.tar.gz 2095.....
maxpumperla commented 4 years ago

Right, but when I give you a link with a potential solution, why don't you try that at least? :D this is a multiprocessing issue and the code (loading go data) uses that. So instead try:

from dlgo.data.parallel_processor import GoDataProcessor

def main():
    processor = GoDataProcessor()
    generator = processor.load_go_data('train', 100, use_generator=True)
    print(generator.get_num_samples())
    generator = generator.generate(batch_size=10)
    X, y = generator.next()

if __name__ == "__main__":
    main()

That would at least be good to confirm. p.s. next comes with Python generators. https://stackoverflow.com/questions/1073396/is-generator-next-visible-in-python-3-0

VideoPac commented 4 years ago

My bad, I was at the same time on the solution you gave me of course, but couldn't yet figure out to which part of the code I should apply this... I thought that should go somewhere in parallel_processor and couldn't make it work... that was very obvious in fact. And ok, generator.next() should be changed to next(generator), I get it. Anyway thanks a lot for your help, I'm almost there I guess, I have now at least have the first epoch running in train_generator. I still get a :

H5pyDeprecationWarning:
The default file mode will change to 'r' (read-only) in h5py 3.0.
To suppress this warning, pass the mode you need to h5py.File(),
or set the global default h5.get_config().default_file_mode,
or set the environment variable H5PY_DEFAULT_READONLY=1.

and a

KeyError: 'Cannot set attribute. Group with name "keras_version" exists.

after the first epoch but I'll try to fix those by myself before asking for help :) Should I commit the changes to chap 7 branch once done?

maxpumperla commented 4 years ago

@VideoPac no worries, happy to help.

yeah, if you could open a PR that'd be amazing! (we should make sure however, that the code works with both python 2 and 3 if possible)

your h5py message is just a warning, can be ignored. The other one is googleable https://github.com/keras-team/keras/issues/11276

YamilVidal commented 3 years ago

Right, but when I give you a link with a potential solution, why don't you try that at least? :D this is a multiprocessing issue and the code (loading go data) uses that. So instead try:

from dlgo.data.parallel_processor import GoDataProcessor

def main():
    processor = GoDataProcessor()
    generator = processor.load_go_data('train', 100, use_generator=True)
    print(generator.get_num_samples())
    generator = generator.generate(batch_size=10)
    X, y = generator.next()

if __name__ == "__main__":
    main()

That would at least be good to confirm. p.s. next comes with Python generators. https://stackoverflow.com/questions/1073396/is-generator-next-visible-in-python-3-0

Hi all!! I'm having the same (or similar issue). I'm using PyCharm with Python 3.8 in a Windows machine

If I use the solution quoted above:

import stuff needed

def main():
    processor = GoDataProcessor()
    generator = processor.load_go_data('train', 100, use_generator=True)
    more code
    more code

if __name__ == "__main__":
    main()

And then run the code (green play button), the code runs well and I can train models. But I would like to run code in console, line by line, which is way better to learn.

If I try to run the code in console line by line, when I run

generator = processor.load_go_data('train', 1, use_generator=True)

I get an endless feed of repetitions of the following:

File "<string>", line 1, in <module>
  File "C:\Users\LV4\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\LV4\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\LV4\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\LV4\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\LV4\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 264, in run_path
    code, fname = _get_code_from_file(run_name, path_name)
  File "C:\Users\LV4\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 234, in _get_code_from_file
    with io.open_code(decoded_path) as f:
OSError: [Errno 22] Invalid argument: 'C:\\Data\\Mehlernas\\DatosCurrent\\DLGO - code\\<input>'

Note that I don't get this error:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

I do get that error if I try to run the script (green play button) without the def main():, like this:

import stuff needed

processor = GoDataProcessor()
generator = processor.load_go_data('train', 100, use_generator=True)

Does any of you know if there is a way to get the code working in the console line by line? I guess that I could just use the def main(): method and use the debugger to see what happens line by line... Thanks!!