Reproducing Results - Githubissues

I'm trying to reproduce the results in the paper following the documentation. The train_pickplace.py script says that the return should increase to 3000 within a hundred epochs. However, I'm seeing much lower results:

df['AverageReturn'] 0 -50.000000 1 -50.000000 2 -50.000000 3 -50.000000 4 -50.000000 ...
2995 -4.454546 2996 -12.636364 2997 -13.909091 2998 -12.818182 2999 -25.909090

The task isn't completely solved. The maximum value of 'Test Final Num Blocks Stacked' is 0.8, but the average value is a lot lower, around 0.4.

Are these the expected numbers or is there something I can do to improve them?

richardrl / rlkit-relational

Reproducing Results #10