base stock policy fails

hubbs5 / or-gym

Environments for OR and RL Research

MIT License

373 stars 93 forks source link

Closed riccardopoiani closed 1 year ago

riccardopoiani commented 1 year ago

Calculating the base stock policy leads to a NAN error

riccardopoiani commented 1 year ago

Also, I have followed this tutorial. https://www.datahubbs.com/how-to-use-deep-reinforcement-learning-to-improve-your-supply-chain/ Is it normal that with higher values of the discount factor (e.g., 0.999), DFO can only reach -400 as reward?