henrycharlesworth / big2_PPOalgorithm

Application of proximal policy optimization algorithm to the card game Big 2 using Tensorflow
75 stars 28 forks source link

parameter of network is Nan #5

Open BrainWWW opened 6 years ago

BrainWWW commented 6 years ago

hi, i'm doing some reaserch on mutiagent and big2 is a really good plateform to test multiagent algorithm! but i got a problem about your code that after a long time update, the parameters of the nerual netowrk is Nan. Then i dump latest parameters which have not been nan and train the neural network again. After a long time update, parameters became Nan again. i debug it and i think it caused by Gradient explosion or gradien vanish. so Do you meet the problom just same as me? how do you fix it? Yours Sincerely!

henrycharlesworth commented 6 years ago

Hi! Yes initially when I trained it (to get the parameters which are included in the repository) this never happened - this was using 150 million training steps. But when I went on to train it for longer I did find that eventually I started getting NaNs. My extremely inelegant solution was to just save the previous set of parameters before each update, and then check to see if any parameters become NaN. If they do, just set the parameters back to the previous ones. I would like to get to the bottom of what's causing this at some point though, but it's quite hard to pin down as it seems to happen so rarely. This was also one of my first projects using tensorflow and so I'm pretty inexperienced at debugging these kinds of issues.

Anyway, I'm pretty certain it's a problem with the neural network (as you say, probably a vanishing/exploding gradient) and not the game logic itself - so if you want to use it for trying out any other multi agent algorithms it should be fine. If you do that, I'd be very interested to see how well they do against the agent I trained!

BrainWWW commented 6 years ago

Thank you for your replay! I did some experiments afterwards and i got a really weird results. It seems it can be trained moren than 120000updates on CPU(it has not stopped yet). but when trained on GPU, it can be trained only about 20000updates to 50000udpates and then became NaN.

At the first time of running the code, code goes awry(48 processing cause memory error).
So, I add the code blew ,trying to manage GPU memory: config = tf.ConfigProto()

config.gpu_options.per_process_gpu_memory_fraction = 0.7

config.gpu_options.allow_growth = True

it does not report error, but after 20000update, I started getting NaNs. Then, I reduce the number of processing. It works, but network only got 50000updates and became NaN.

GPU I used is 1080Ti with 11GB. I never met this problem before and have't found similar problem.... by the way, i debuged where it became NaN. it became Nan after backpropagation. Before that, Loss is a number. So there is no problem in calculating loss function.

henrycharlesworth commented 6 years ago

Yeah that's very peculiar. For me, I can train it on my GPU (1070) for more updates than that before getting NaNs, but they still come eventually. I really don't know what's going on tbh, if I have some time I'll try and debug it properly but it's just weird! And as I say, a workaround is just to save the previous parameter values and check for NaNs when you update - since it happens so rarely this works, although I appreciate that it's not ideal!

BrainWWW commented 6 years ago

Maybe there is some wrong with tensorflow or cuDNN, i will check it : ) Just one more question, in big2game.py: from line 194 to line 260, when you update the previous hand info(phInd), the index needs to subtract 1. e.g: In line200, 'self.neuralNetworkInputs[nPlayer][phInd + 19 ] = 1' shoule be changed into 'self.neuralNetworkInputs[nPlayer][phInd + 19 - 1 ] = 1' , right?

henrycharlesworth commented 6 years ago

OK well I appreciate you spending the time trying to figure it out, let me know if you're able to fix the issue!

On the second question, I don't think so. To be honest this part of the code is horribly written so it's hard to tell what's going on but the numbers should be chosen in a way which means you don't need to subtract anything. At least I'm fairly sure this is the case, I tested it quite rigorously with the GUI (see generateGUI.py) which has a second window showing the neural network inputs at each point in the game - and I'm pretty sure they are correct. I could be wrong though, as I did write this part of the code about a year ago now!

BrainWWW commented 6 years ago

Yeah, Absolutely! : )

I understand, because the game logic is really hard to read. Since you tested it with the GUI, that part should be right.

big2 is a really good test platform, if i find other issue(maybe it's not), i will let you know : ) Thank you for your reply sincerelly!