Closed varun-intel closed 3 years ago
I'm trying to reproduce the results in the paper following the documentation. The train_pickplace.py script says that the return should increase to 3000 within a hundred epochs. However, I'm seeing much lower results:
df['AverageReturn'] 0 -50.000000 1 -50.000000 2 -50.000000 3 -50.000000 4 -50.000000 ... 2995 -4.454546 2996 -12.636364 2997 -13.909091 2998 -12.818182 2999 -25.909090
The task isn't completely solved. The maximum value of 'Test Final Num Blocks Stacked' is 0.8, but the average value is a lot lower, around 0.4.
Are these the expected numbers or is there something I can do to improve them?
Sorry, that comment is totally wrong. It should be deleted.
Please check the arxiv paper appendix for training times.
I'm trying to reproduce the results in the paper following the documentation. The train_pickplace.py script says that the return should increase to 3000 within a hundred epochs. However, I'm seeing much lower results:
The task isn't completely solved. The maximum value of 'Test Final Num Blocks Stacked' is 0.8, but the average value is a lot lower, around 0.4.
Are these the expected numbers or is there something I can do to improve them?