richardrl / rlkit-relational

Codebase for ICRA 2020 paper "Towards Practical Multi-object Manipulation using Relational Reinforcement Learning"
MIT License
97 stars 10 forks source link

Reproducing Results #10

Closed varun-intel closed 3 years ago

varun-intel commented 3 years ago

I'm trying to reproduce the results in the paper following the documentation. The train_pickplace.py script says that the return should increase to 3000 within a hundred epochs. However, I'm seeing much lower results:

df['AverageReturn'] 0 -50.000000 1 -50.000000 2 -50.000000 3 -50.000000 4 -50.000000 ...
2995 -4.454546 2996 -12.636364 2997 -13.909091 2998 -12.818182 2999 -25.909090

The task isn't completely solved. The maximum value of 'Test Final Num Blocks Stacked' is 0.8, but the average value is a lot lower, around 0.4.

Are these the expected numbers or is there something I can do to improve them?

richardrl commented 3 years ago

Sorry, that comment is totally wrong. It should be deleted.

Please check the arxiv paper appendix for training times.