maxpumperla / deep_learning_and_the_game_of_go

Code and other material for the book "Deep Learning and the Game of Go"
https://www.manning.com/books/deep-learning-and-the-game-of-go
953 stars 387 forks source link

chapter 13: alphago_policy_sl.py #52

Open yoshih-2go opened 4 years ago

yoshih-2go commented 4 years ago

I would like to report a result of excuting alphago_policy_sl.py(ch.13). Hope, to know the reason why I got the error message from the original python script.

modify[1]:necessary for windows10 using multiprocessing ? if name == 'main': #insert after 'import h5py'

python alphago_policy_sl.py

I've got the error message cited at the bottom. I wonder if it is related with multiprocessing. But the message disappear when another encoder is used, as described below.

modify[2]

from dlgo.encoders.alphago import AlphaGoEncoder

from dlgo.encoders.sevenplane import SevenPlaneEncoder modify[3]

encoder = AlphaGoEncoder()

encoder = SevenPlaneEncoder((rows, cols))

python alphago_policy_sl.py at present, the script is runnning

Epoch 1/200 15328/15328 [==============================] - 3900s 254ms/step - loss: 4.2390 - acc: 0.1139 - val_loss: 3.8673 - val_acc: 0.1338 Epoch 2/200 4547/15328 [=======>......................] - ETA: 33:51 - loss: 3.8487 - acc: 0.1413

alphago_sl_policy_1.h5(26,264KB) is created at this point.


My PC


Windows 10 64 bit Python 3.6 tensorflow-gpu 1.12.0 Keras 2.2.4


error message


multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "E:\Python36\lib\multiprocessing\pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "E:\Python36\lib\multiprocessing\pool.py", line 44, in mapstar return list(map(args)) File "F:\IGO\DeepLearningandtheGameofGo\deep_learning_and_the_game_of_go-chapter_13\code\dlgo\data\parallel_processor.py", line 26, in worker clazz(encoder=encoder).process_zip(zip_file, data_file_name, game_list) File "F:\IGO\DeepLearningandtheGameofGo\deep_learning_and_the_game_of_go-chapter_13\code\dlgo\data\parallel_processor.py", line 101, in process_zip features[counter] = self.encoder.encode(game_state) File "F:\IGO\DeepLearningandtheGameofGo\deep_learning_and_the_game_of_go-chapter_13\code\dlgo\encoders\alphago.py", line 82, in encode if game_state.is_valid_move(move): File "F:\IGO\DeepLearningandtheGameofGo\deep_learning_and_the_game_of_go-chapter_13\code\dlgo\goboard_fast.py", line 360, in is_valid_move if self.is_over(): File "F:\IGO\DeepLearningandtheGameofGo\deep_learning_and_the_game_of_go-chapter_13\code\dlgo\goboard_fast.py", line 372, in is_over if self.last_move.is_resign: AttributeError: 'tuple' object has no attribute 'is_resign' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "alphago_policy_sl_step1.py", line 17, in generator = processor.load_go_data('train', num_games, use_generator=True) File "F:\IGO\DeepLearningandtheGameofGo\deep_learning_and_the_game_of_go-chapter_13\code\dlgo\data\parallel_processor.py", line 46, in load_go_data self.map_to_workers(data_type, data) # <1> File "F:\IGO\DeepLearningandtheGameofGo\deep_learning_and_the_game_of_go-chapter_13\code\dlgo\data\parallel_processor.py", line 189, in map_toworkers = p.get() File "E:\Python36\lib\multiprocessing\pool.py", line 644, in get raise self._value AttributeError: 'tuple' object has no attribute 'is_resign'

darecophoenixx commented 4 years ago

This error may be due to the handling of handicap. If you use AlphaGoEncoder, it seems good to skip games with handicap.

in process_zip() of processor.py

            sgf = Sgf_game.from_string(sgf_content)  # <3>
            if sgf.get_handicap() is not None and sgf.get_handicap() != 0:
                print('skipping handicaped game ...')
                continue
            game_state, first_move_done = self.get_handicap(sgf)  # <4>
yoshih-2go commented 4 years ago

Error disappeared. Thank you very much. I have modified both parallel_processor.py and processor.py, as you told.

yoshih-2go commented 4 years ago

Too early to judge. Still errors appear.

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "E:\Python36\lib\multiprocessing\pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "E:\Python36\lib\multiprocessing\pool.py", line 44, in mapstar return list(map(args)) File "F:\work\code\dlgo\data\parallel_processor.py", line 26, in worker clazz(encoder=encoder).process_zip(zip_file, data_file_name, game_list) File "F:\work\code\dlgo\data\parallel_processor.py", line 77, in process_zip features = np.zeros(feature_shape) MemoryError """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "alphago_policy_sl.py", line 59, in main() File "alphago_policy_sl.py", line 22, in main generator = processor.load_go_data('train', num_games, use_generator=True) File "F:\work\code\dlgo\data\parallel_processor.py", line 46, in load_go_data self.map_to_workers(data_type, data) # <1> File "F:\work\code\dlgo\data\parallel_processor.py", line 197, in map_toworkers = p.get() File "E:\Python36\lib\multiprocessing\pool.py", line 644, in get raise self._value MemoryError

darecophoenixx commented 4 years ago

This is probably due to another cause.

Put

print('feature_shape >', feature_shape)

before

features = np.zeros(feature_shape)

and check how much memory you are trying to allocate.

Depending on the case, following may be better.

features = np.zeros(feature_shape, dtype='float32')
yoshih-2go commented 4 years ago

Thank you very much again. I followed your advice, feature_shape > [ x 49 19 19] x increases as num_games increases.

My PC has 16GB memory. num_games = 10000 ---> memory error num_games = 1000 ---> no memory error / memory usage -- 7.5GB num_games = 3000 ---> no memory error / memory usage -- 15.7GB I guess ( 3--4GB ) / ( 1000 games )

yoshih-2go commented 4 years ago

by features = np.zeros(feature_shape, dtype='int8') , memory error does't happen with num_games = 10000. Training speed is 18% faster.

darecophoenixx commented 4 years ago

good

darecophoenixx commented 4 years ago

As you understand, there are some features and labels whose values are not set in that implementation.

            sgf = Sgf_game.from_string(sgf_content)  # <3>
            if sgf.get_handicap() is not None and sgf.get_handicap() != 0:
                print('skipping handicaped game ...')
                continue
            game_state, first_move_done = self.get_handicap(sgf)  # <4>

Please calculate the appropriate size first as follows. (in processor.py)

        # check handicaped game
        game_list_omit_handicap = []
        for index in game_list:
            name = name_list[index + 1]
            if not name.endswith('.sgf'):
                raise ValueError(name + ' is not a valid sgf')
            sgf_content = zip_file.extractfile(name).read()
            sgf = Sgf_game.from_string(sgf_content)
            if sgf.get_handicap() is not None and sgf.get_handicap() != 0:
                print('skipping handicaped game ...')
            else:
                game_list_omit_handicap.append(index)

        total_examples = self.num_total_examples(zip_file, game_list, name_list)
        print('total_examples >', total_examples)
        total_examples = self.num_total_examples(zip_file, game_list_omit_handicap, name_list)
        print('total_examples (omit handicap) >', total_examples)

        shape = self.encoder.shape()
        feature_shape = np.insert(shape, 0, np.asarray([total_examples]))
        print('feature_shape >', feature_shape)
        #features = np.zeros(feature_shape, dtype='float32')
        features = np.zeros(feature_shape, dtype='float16')
        #print('features[0] >', features[0])
        labels = np.zeros((total_examples,))

        counter = 0
        for index in game_list_omit_handicap: # <<<<<<<<<<
            ...