hubbs5 / or-gym

Environments for OR and RL Research
MIT License
373 stars 93 forks source link

Applying DFO to InvManagement-v0 failed #16

Closed Thomas-Gentilhomme closed 2 years ago

Thomas-Gentilhomme commented 3 years ago

Hi !

I have seen your incredible tutorial where you apply PPO and DFO to InvManagement-v1 (here), using Powell method.

I have successfully reproduced your results but I have tried to do the same with InvManagement-v0 but - I have no idea why - it does not work. The optimal base stock level that is found is [2 1 3], which give a mean reward of ~44 (so far from the 360.9 from your paper). I guess it comes from the initial base stock lever ([1 1 1] by default for InvManagement-v1) but I don't know what to do then.

Can you help me ?

Many thanks !

Thomas G.

hdavid16 commented 3 years ago

What are the termination statuses in the printout for the DFO?

Thomas-Gentilhomme commented 3 years ago

Hi @hdavid16 ! The terminaison in the printout for the DFO for InvManagement-v0 is the following:

Re-order levels: [2 1 3] DFO Info: direc: array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]) fun: -0.13299508348235206 message: 'Optimization terminated successfully.' nfev: 101 nit: 2 status: 0 success: True x: array([2.01824427, 0.93369168, 2.55297131])

When I apply the same piece of code but for InvManagement-v1 (the one of the tutorial), I obtain the following output:

Re-order levels: [540 216 81] DFO Info: direc: array([[ 0. , 0. , 1. ], [ 0. , 1. , 0. ], [206.39353826, 81.74560612, 28.78995703]]) fun: -0.9450780368543933 message: 'Optimization terminated successfully.' nfev: 212 nit: 5 status: 0 success: True x: array([539.7995151 , 216.38046861, 80.66902905])

Many thanks !

Thomas

hdavid16 commented 3 years ago

My guess would be that the DFO got stuck in a local optima, did both versions have the same initial conditions?