average_loss_window has a value of NaN

Hello,

I've recently started using this library and it has been exciting so far. However, it appears that the backward() function is not returning a proper value and as such I'm not really able to train anything.

Specifically, the variable avcost gets a value of NaN. After tracing the issue back to the policy function, it appears that maxval is getting a value of NaN, which seems to be because action_values.w contains NaN for all its values, which I think is because all of the layers associated with value_net have Float64Array values of NaN's pretty much across the board. I'm using the layer setup found in the rldemo demonstration so I'm not really sure how to progress past this.

Any help is appreciated. Sorry for any formatting issues, I don't regularly use GitHub.

Thanks.

dear ryley, welcome to github. may you please provide minimal code example? thanks

בתאריך יום ה׳, 24 באוק׳ 2019, 05:20, מאת RyleyGG ‏<notifications@github.com

:

Hello,

I've recently started using this library and it has been exciting so far. However, it appears that the backward() function is not returning a proper value and as such I'm not really able to train anything.

Specifically, the variable avcost gets a value of NaN. After tracing the issue back to the policy function, it appears that maxval is getting a value of NaN, which seems to be because action_values.w contains NaN for all its values, which I think is because all of the layers associated with value_net have Float64Array values of NaN's pretty much across the board. I'm using the layer setup found in the rldemo demonstration so I'm not really sure how to progress past this.

Any help is appreciated. Sorry for any formatting issues, I don't regularly use GitHub.

Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/karpathy/convnetjs/issues/114?email_source=notifications&email_token=ACVA7LBFK76URTD3T44SNB3QQEA7TA5CNFSM4JENXXD2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HT7LWQQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACVA7LFRGTL3FGCXOP7LZN3QQEA7TANCNFSM4JENXXDQ .

@NoamGaash Of course.

Here is the backward function I was referring to, found in build/deepqlearn.js https://github.com/karpathy/convnetjs/blob/4c3358a315b4d71f31a0d532eb5d1700e9e592ee/build/deepqlearn.js#L213-L258

Specifically, these are the lines I believe are associated with my issue: https://github.com/karpathy/convnetjs/blob/4c3358a315b4d71f31a0d532eb5d1700e9e592ee/build/deepqlearn.js#L242-L258

After avcost += loss.loss, avcost has a value of NaN. loss is defined here: https://github.com/karpathy/convnetjs/blob/4c3358a315b4d71f31a0d532eb5d1700e9e592ee/build/deepqlearn.js#L254

loss.cost_loss, loss.loss, and loss.softmax_loss all have a value of NaN for me. loss is partially dependent on the value of ystruct -- I found that ystruct.val was also returning NaN. The variable used for ystruct.val, r, is defined here: https://github.com/karpathy/convnetjs/blob/4c3358a315b4d71f31a0d532eb5d1700e9e592ee/build/deepqlearn.js#L252

I found that e.reward0 and this.gamma both had the values they should, but maxact.value did not, again returning NaN. maxact is defined here: https://github.com/karpathy/convnetjs/blob/4c3358a315b4d71f31a0d532eb5d1700e9e592ee/build/deepqlearn.js#L251

e.state1 has a valid value from what I can tell, and so it is my belief that my issue lies somewhere in the policy function, which is defined here: https://github.com/karpathy/convnetjs/blob/4c3358a315b4d71f31a0d532eb5d1700e9e592ee/build/deepqlearn.js#L139-L151

One of the values policy returns is maxval, which is also getting a value of NaN. maxval is defined as var maxval = action_values.w[0];, and in my case action_values.w is returning all NaN values.

action_values is defined here: var action_values = this.value_net.forward(svol); svol appears to have correct data from what I can tell, whereas this.value_net does not. Logging this.value_net will list several layers, and breaking them down will show a result like this:

So as you can see, plenty of NaN values. This is where I'm stumped. I'm assuming the issue to be layer-related, but I'm using the same exact layer setup as is used in the rldemo, so I'm not really sure on where to go from here. Here is how I'm setting the network up:

const temporal_window = 1; // amount of temporal memory. 0 = agent lives in-the-moment.
    const network_size = inputNum*temporal_window + actionNum*temporal_window + inputNum;

    // the value function network computes a value of taking any of the possible actions
    // given an input state. Here we specify one explicitly the hard way
    // but user could also equivalently instead use opt.hidden_layer_sizes = [20,20]
    // to just insert simple relu hidden layers.
    let layer_defs = [];
    layer_defs.push({type:'input', out_sx:1, out_sy:1, out_depth:network_size});
    layer_defs.push({type:'fc', num_neurons: 50, activation:'relu'});
    layer_defs.push({type:'fc', num_neurons: 50, activation:'relu'});
    layer_defs.push({type:'regression', num_neurons:actionNum});

    // options for the Temporal Difference learner that trains the above net
    // by backpropping the temporal difference learning rule.
    let tdtrainer_options = {learning_rate:0.001, momentum:0.0, batch_size:64, l2_decay:0.01};

    //opt is the set of all configurable options related to the bot
    let opt = {}; //Array of the various options
    opt.temporal_window = temporal_window; //The amount of "temporal memory" the AI has, in terms of "time steps"
    opt.experience_size = 1500; //size of experience replay memory
    opt.start_learn_threshold = 25; //number of examples in experience replay memory before AI begins learning
    opt.gamma = 0.7; //Determines how much the AI plans ahead, on a scale of 0 to 1.
    opt.learning_steps_total = 50000; //Number of total steps to learn for
    opt.learning_steps_burnin = 25; //For the above number of steps, how many should be completely random at the beginning of the learning process?
    opt.epsilon_min = 0.03; //Epsilon determines the amount of randomness the AI will implement over time. Set to 0 for AI to only use learned experiences deep into the learning process
    opt.epsilon_test_time = 0.03; //what epsilon to use at test time? (i.e. when learning is disabled)
    opt.layer_defs = layer_defs;
    opt.tdtrainer_options = tdtrainer_options;
    brain = new deepqlearn.Brain(inputNum, actionNum, opt);

The other properties of the brain object, such as the average_reward_window, etc. all seem to be working properly, and I'm not getting any warnings/errors into the console. Whenever I call brain.backward(), it is with a proper value for reward such as 50 or -50.

by minimal code example I addressed this definition: https://stackoverflow.com/help/minimal-reproducible-example

I'm not sure I can reproduce your problem based on the information you gave me.

have you verified the brain.backward function received numeric input of type 'number' (not volume)?

@NoamGaash Yes, I'm sure the function is receiving an input of type number. I have validated this by checking the type of the reward immediately before calling brain.backward, as well as checking the value in the backward function itself; both cases returned 'number'.

Also, I apologize for misunderstanding your last request. You should be able to reproduce the issue with this code:

<html>
<head>

    <script src = "build/convnet.js"></script>
    <script src = "build/deepqlearn.js"></script>
    <script src = "build/util.js"></script>
    <script src = "build/vis.js"></script>
    <script type = "text/javascript">

    inputNum = 4;
    actionNum = 5;
    let brain;
    let gameInfo = [];
    let decision;
    let reward = 0;
    let opt = {}; //Array of the various options

    function initNet()
    {
        const temporal_window = 1; // amount of temporal memory. 0 = agent lives in-the-moment. 
        const network_size = inputNum*temporal_window + actionNum*temporal_window + inputNum;

        // the value function network computes a value of taking any of the possible actions
        // given an input state. Here we specify one explicitly the hard way
        // but user could also equivalently instead use opt.hidden_layer_sizes = [20,20]
        // to just insert simple relu hidden layers.
        let layer_defs = [];
        layer_defs.push({type:'input', out_sx:1, out_sy:1, out_depth:network_size});
        layer_defs.push({type:'fc', num_neurons: 50, activation:'relu'});
        layer_defs.push({type:'fc', num_neurons: 50, activation:'relu'});
        layer_defs.push({type:'regression', num_neurons:actionNum});

        // options for the Temporal Difference learner that trains the above net
        // by backpropping the temporal difference learning rule.
        let tdtrainer_options = {learning_rate:0.001, momentum:0.0, batch_size:64, l2_decay:0.01};

        //opt is the set of all configurable options related to the bot

        opt.temporal_window = temporal_window; //The amount of "temporal memory" the AI has, in terms of "time steps"
        opt.experience_size = 1500; //size of experience replay memory
        opt.start_learn_threshold = 25; //number of examples in experience replay memory before AI begins learning
        opt.gamma = 1; //Determines how much the AI plans ahead, on a scale of 0 to 1.
        opt.learning_steps_total = 50000; //Number of total steps to learn for
        opt.learning_steps_burnin = 25; //For the above number of steps, how many should be completely random at the beginning of the learning process?
        opt.epsilon_min = 0; //Epsilon determines the amount of randomness the AI will implement over time. Set to 0 for AI to only use learned experiences deep into the learning process
        opt.epsilon_test_time = 0; //what epsilon to use at test time? (i.e. when learning is disabled)
        opt.layer_defs = layer_defs;
        opt.tdtrainer_options = tdtrainer_options;
        brain = new deepqlearn.Brain(inputNum, actionNum, opt);
    }
    </script>

</head>

<body>

<script type = 'text/javascript'>
    initNet();
    function refreshBot() //Refreshes the AI with new information regarding the gamestate and applies rewards
    {
        gameInfo = [Math.random(), Math.random(), Math.random()];
        for(k = 0; k<500; k++) 
        {
            console.log('Currently on run '+(k+1));
            decision = brain.forward(gameInfo); // returns index of chosen action
            reward = decision === 0 ? 1.0 : 0.0;
            brain.backward(reward);
            gameInfo[Math.floor(Math.random()*3)] += Math.random()*2-0.5;
        }

        console.log(brain.average_loss_window); //Returns NaN values
    }
    refreshBot();

</script>
</body>
</html>

Using the example above I am receiving values of NaN in brain.average_loss_window. The initNet() function is exactly the same as in my actual code, but in order to make the example proper I simplified the refreshBot function with the example given at the bottom of this page, and I am still getting NaN values so I'm sure none of that code is of issue. Let me know if I can provide any more info.

Thanks.

I noticed you set inputNum = 4, and gameInfo = [Math.random(), Math.random(), Math.random()].

change inputNum to be 4, or add an element to gameInfo.

here is a working example with 4 inputs

BTW - good luck with your Tetris project! sounds promising.

@NoamGaash The inputNum = 4 was an artifact from my code that I forgot to change to reflect the loop I used in the example code :/.

Nonetheless, it helped me figure out the actual problem. My original set of inputs that I was passing brain.forward included a couple of different arrays, so it looked something like gameInfo = [2d array, array, 2d array, number], adding an array into the set of inputs seems to break something in the forward function, meaning that without changing the source code the solution is to only pass numbers as inputs.

I also found that there were some other miscellaneous issues wrong with my code that weren't the fault of the library but myself. Those have been fixed and it appears to be working properly now.

I appreciate it, and thanks for all the help!

karpathy / convnetjs

average_loss_window has a value of NaN #114