pagand / ORL_optimizer

offline RL optimizer
0 stars 0 forks source link

Halfcheetah LSTM transition and reward #13

Closed jnqian99 closed 3 months ago

jnqian99 commented 3 months ago

For half cheetah medium v2,

the R2 values are

R^2 NAR: 0.9920237939804792 R^2 AR: 0.9687655176967382

And for individual states the R2 values are:

state 0 R^2 NAR: 0.7300270795822144 R^2 AR: 0.3690910339355469 state 1 R^2 NAR: 0.7682727426290512 R^2 AR: 0.6152023077011108 state 2 R^2 NAR: 0.9749778918921947 R^2 AR: 0.8899503797292709 state 3 R^2 NAR: 0.9800169412046671 R^2 AR: 0.941743116825819 state 4 R^2 NAR: 0.9236404746770859 R^2 AR: 0.8204318881034851 state 5 R^2 NAR: 0.9874668614938855 R^2 AR: 0.9416524693369865 state 6 R^2 NAR: 0.9486607722938061 R^2 AR: 0.8477649688720703 state 7 R^2 NAR: 0.9722550455480814 R^2 AR: 0.9106765314936638 state 8 R^2 NAR: 0.9922399176284671 R^2 AR: 0.9282858818769455 state 9 R^2 NAR: 0.9421906657516956 R^2 AR: 0.7696249336004257 state 10 R^2 NAR: 0.9783415794372559 R^2 AR: 0.8957170322537422 state 11 R^2 NAR: 0.993208022788167 R^2 AR: 0.9689102619886398 state 12 R^2 NAR: 0.993740382604301 R^2 AR: 0.9761422052979469 state 13 R^2 NAR: 0.967949952930212 R^2 AR: 0.8987857103347778 state 14 R^2 NAR: 0.9961293051019311 R^2 AR: 0.987827131524682 state 15 R^2 NAR: 0.9828768614679575 R^2 AR: 0.9173456653952599 state 16 R^2 NAR: 0.9940944681875408 R^2 AR: 0.9693082924932241

jnqian99 commented 3 months ago

The graph for state 5 is as follows:

Image

jnqian99 commented 3 months ago

Zoom in view for the above graph near the end of the prediction

Image

jnqian99 commented 3 months ago

Graph for state 12

Image

jnqian99 commented 3 months ago

Zoomed in view for above graph

Image

jnqian99 commented 3 months ago

For state 0:

Image

jnqian99 commented 3 months ago

Zoomed in view for state 0 near end

Image

jnqian99 commented 3 months ago

Rewards

Image

jnqian99 commented 3 months ago

Rewards zoomed in

Image

pagand commented 3 months ago

@jnqian99 these are looking good. I have changed it to done. if you want to start for new environment, create a new issue with the timing.

pagand commented 3 months ago

Send back the PR for the commit #f337764 for review