technobium / q-learning-java

Reinforcement learning - Q learning in Java
33 stars 26 forks source link

Defect in maximum calculation #3

Open michaldo opened 6 years ago

michaldo commented 6 years ago

A standard pattern is used for maximum calculation:

double maxValue = Double.MIN_VALUE;
for (int nextAction : actionsFromState) {
  double value = Q[nextState][nextAction];

  if (value > maxValue) 
    maxValue = value;
}

But Double.MIN_VALUE in Java is not minimal double but minimal positive value. See https://stackoverflow.com/questions/3884793/why-is-double-min-value-in-not-negative for details.

Especially, if each value considered in for-loop is 0, calculated maximum is Double.MIN_VALUE, because Double.MIN_VALUE > 0.

However, replacing Double.MIN_VALUE with Double.NEGATIVE_INFINITY cause wrong calculation. In such case maxQ(8) is Double.NEGATIVE_INFINITY and whole calculation fails.

Summary: maximum calculation is implemented wrong or at least misleading. It hard to me to propose a solution because it is not clear to me what is expected to be maxQ(8)

muhammad-ahsan commented 6 years ago

I checked the code and I found the same issue as described above. Please use double maxValue = Double.NEGATIVE_INFINITY otherwise condition is never satisfied when it should be if (value > maxValue) maxValue = value;

technobium commented 6 years ago

Thanks for the observation. I changed the code.

SimonAsenime commented 6 years ago

While testing your code, I found that having double maxValue = Double.NEGATIVE_INFINITY throws off the whole calculation. Using Double.MIN_VALUE should fix things. Well, it runs perfect using MIN_VALUE anyway.