nilscrm / stackelberg-ml

0 stars 0 forks source link

Evaluate pretrained policy model #16

Closed nilscrm closed 3 months ago

nilscrm commented 4 months ago

Evaluate the pre-trained policy model on various environments and see if it is optimal.

nilscrm commented 4 months ago

First analysis on 10 random world models. Pre-trained our policy and world model specific PPO models for 10000 training steps.

Scores shown are average reward and standard deviation.

Evalutaing random model 0
Env specific baseline:  (3.695499999038875, 4.253122940104274)
Our contexualized policy:  (2.8959999991953373, 3.1410323137797653)
Evalutaing random model 1
Env specific baseline:  (2.9894999964162707, 2.749984498417424)
Our contexualized policy:  (1.6679999989271164, 1.3084823265621541)
Evalutaing random model 2
Env specific baseline:  (3.7824999985471366, 4.530073811340347)
Our contexualized policy:  (3.7254999984428285, 4.145826182958247)
Evalutaing random model 3
Env specific baseline:  (1.3454999996349215, 1.120560016135359)
Our contexualized policy:  (1.2774999995157124, 0.7351998026641404)
Evalutaing random model 4
Env specific baseline:  (1.2729999998956918, 0.6744412501507899)
Our contexualized policy:  (1.2294999998435379, 0.5805211020523027)
Evalutaing random model 5
Env specific baseline:  (2.220999999716878, 2.8344415675050736)
Our contexualized policy:  (2.0429999995976686, 1.778342205534533)
Evalutaing random model 6
Env specific baseline:  (0.9274999987706543, 0.14201672535812704)
Our contexualized policy:  (0.9344999988749624, 0.15193666565525063)
Evalutaing random model 7
Env specific baseline:  (1.4484999974444508, 0.9524299181292879)
Our contexualized policy:  (1.4809999974817039, 1.1757929238394074)
Evalutaing random model 8
Env specific baseline:  (4.765499996505678, 6.591038212985427)
Our contexualized policy:  (2.209499999396503, 3.618215685834796)
Evalutaing random model 9
Env specific baseline:  (1.1604999993368983, 0.992806501760879)
Our contexualized policy:  (0.9429999991506338, 0.08000625094806446)

Will investigate the differences more.

nilscrm commented 3 months ago

Model learns to play optimally in all models now.