vincentberaud / Minecraft-Reinforcement-Learning

Deep Recurrent Q-Learning vs Deep Q Learning on a simple Partially Observable Markov Decision Process with Minecraft
49 stars 6 forks source link

experience_buffer() - sample: "ValueError: Sample larger than population or is negative" #6

Closed osbornep closed 4 years ago

osbornep commented 5 years ago

Following on from my previous issue, when I attempt to run the training episodes, once the agent goes past the "pre_train_steps", I am getting the following error:

image

This has been from simply replicating all steps given with the exact code copied from the notebook. The only small change I have made is to change the following parameters to test in reasonable time:

Is this a know issue or should I investigate further?

Thanks

ClementRomac commented 5 years ago

Hi,

I'm glad you've resolved your previous issue.

For this one, as I understand by reading your error, you've set the NetType to "Convolutional". Thus, you are using the "experience_buffer" to store steps. At each timestep you want to train your network (set by "update_freq"), you sample from this experience buffer a sample of size "batch_size" (if you haven't changed it, it must be 32). But you are starting to train your network only once "pre_train_steps" has been reached.

For your case, this means that the first time you try to get 32 steps from the experience_buffer, it only has 10 steps stored (as pre_train_steps has been reached).

I believe you can solve this problem by setting batch_size >= pre_train_steps (either by increasing pre_train_steps or by decreasing batch_size).

I hope I'm clear enough in these explanations. If not, just tell me.

Clément