Open yoshih-2go opened 4 years ago
This error may be due to the handling of handicap. If you use AlphaGoEncoder, it seems good to skip games with handicap.
in process_zip() of processor.py
sgf = Sgf_game.from_string(sgf_content) # <3>
if sgf.get_handicap() is not None and sgf.get_handicap() != 0:
print('skipping handicaped game ...')
continue
game_state, first_move_done = self.get_handicap(sgf) # <4>
Error disappeared. Thank you very much. I have modified both parallel_processor.py and processor.py, as you told.
multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "E:\Python36\lib\multiprocessing\pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "E:\Python36\lib\multiprocessing\pool.py", line 44, in mapstar return list(map(args)) File "F:\work\code\dlgo\data\parallel_processor.py", line 26, in worker clazz(encoder=encoder).process_zip(zip_file, data_file_name, game_list) File "F:\work\code\dlgo\data\parallel_processor.py", line 77, in process_zip features = np.zeros(feature_shape) MemoryError """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "alphago_policy_sl.py", line 59, in
This is probably due to another cause.
Put
print('feature_shape >', feature_shape)
before
features = np.zeros(feature_shape)
and check how much memory you are trying to allocate.
Depending on the case, following may be better.
features = np.zeros(feature_shape, dtype='float32')
Thank you very much again. I followed your advice, feature_shape > [ x 49 19 19] x increases as num_games increases.
My PC has 16GB memory. num_games = 10000 ---> memory error num_games = 1000 ---> no memory error / memory usage -- 7.5GB num_games = 3000 ---> no memory error / memory usage -- 15.7GB I guess ( 3--4GB ) / ( 1000 games )
by features = np.zeros(feature_shape, dtype='int8') , memory error does't happen with num_games = 10000. Training speed is 18% faster.
good
As you understand, there are some features and labels whose values are not set in that implementation.
sgf = Sgf_game.from_string(sgf_content) # <3>
if sgf.get_handicap() is not None and sgf.get_handicap() != 0:
print('skipping handicaped game ...')
continue
game_state, first_move_done = self.get_handicap(sgf) # <4>
Please calculate the appropriate size first as follows. (in processor.py)
# check handicaped game
game_list_omit_handicap = []
for index in game_list:
name = name_list[index + 1]
if not name.endswith('.sgf'):
raise ValueError(name + ' is not a valid sgf')
sgf_content = zip_file.extractfile(name).read()
sgf = Sgf_game.from_string(sgf_content)
if sgf.get_handicap() is not None and sgf.get_handicap() != 0:
print('skipping handicaped game ...')
else:
game_list_omit_handicap.append(index)
total_examples = self.num_total_examples(zip_file, game_list, name_list)
print('total_examples >', total_examples)
total_examples = self.num_total_examples(zip_file, game_list_omit_handicap, name_list)
print('total_examples (omit handicap) >', total_examples)
shape = self.encoder.shape()
feature_shape = np.insert(shape, 0, np.asarray([total_examples]))
print('feature_shape >', feature_shape)
#features = np.zeros(feature_shape, dtype='float32')
features = np.zeros(feature_shape, dtype='float16')
#print('features[0] >', features[0])
labels = np.zeros((total_examples,))
counter = 0
for index in game_list_omit_handicap: # <<<<<<<<<<
...
I would like to report a result of excuting alphago_policy_sl.py(ch.13). Hope, to know the reason why I got the error message from the original python script.
modify[1]:necessary for windows10 using multiprocessing ? if name == 'main': #insert after 'import h5py'
I've got the error message cited at the bottom. I wonder if it is related with multiprocessing. But the message disappear when another encoder is used, as described below.
modify[2]
from dlgo.encoders.alphago import AlphaGoEncoder
from dlgo.encoders.sevenplane import SevenPlaneEncoder modify[3]
encoder = AlphaGoEncoder()
encoder = SevenPlaneEncoder((rows, cols))
Epoch 1/200 15328/15328 [==============================] - 3900s 254ms/step - loss: 4.2390 - acc: 0.1139 - val_loss: 3.8673 - val_acc: 0.1338 Epoch 2/200 4547/15328 [=======>......................] - ETA: 33:51 - loss: 3.8487 - acc: 0.1413
alphago_sl_policy_1.h5(26,264KB) is created at this point.
My PC
Windows 10 64 bit Python 3.6 tensorflow-gpu 1.12.0 Keras 2.2.4
error message
multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "E:\Python36\lib\multiprocessing\pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "E:\Python36\lib\multiprocessing\pool.py", line 44, in mapstar return list(map(args)) File "F:\IGO\DeepLearningandtheGameofGo\deep_learning_and_the_game_of_go-chapter_13\code\dlgo\data\parallel_processor.py", line 26, in worker clazz(encoder=encoder).process_zip(zip_file, data_file_name, game_list) File "F:\IGO\DeepLearningandtheGameofGo\deep_learning_and_the_game_of_go-chapter_13\code\dlgo\data\parallel_processor.py", line 101, in process_zip features[counter] = self.encoder.encode(game_state) File "F:\IGO\DeepLearningandtheGameofGo\deep_learning_and_the_game_of_go-chapter_13\code\dlgo\encoders\alphago.py", line 82, in encode if game_state.is_valid_move(move): File "F:\IGO\DeepLearningandtheGameofGo\deep_learning_and_the_game_of_go-chapter_13\code\dlgo\goboard_fast.py", line 360, in is_valid_move if self.is_over(): File "F:\IGO\DeepLearningandtheGameofGo\deep_learning_and_the_game_of_go-chapter_13\code\dlgo\goboard_fast.py", line 372, in is_over if self.last_move.is_resign: AttributeError: 'tuple' object has no attribute 'is_resign' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "alphago_policy_sl_step1.py", line 17, in
generator = processor.load_go_data('train', num_games, use_generator=True)
File "F:\IGO\DeepLearningandtheGameofGo\deep_learning_and_the_game_of_go-chapter_13\code\dlgo\data\parallel_processor.py", line 46, in load_go_data
self.map_to_workers(data_type, data) # <1>
File "F:\IGO\DeepLearningandtheGameofGo\deep_learning_and_the_game_of_go-chapter_13\code\dlgo\data\parallel_processor.py", line 189, in map_toworkers
= p.get()
File "E:\Python36\lib\multiprocessing\pool.py", line 644, in get
raise self._value
AttributeError: 'tuple' object has no attribute 'is_resign'