matpalm / tcn_grasping

pybullet grasping with time contrastive network embeddings
http://matpalm.com/blog/pybullet_tcn_grasping/
MIT License
21 stars 5 forks source link

replay buffers #3

Open matpalm opened 5 years ago

matpalm commented 5 years ago

additional to the offline mining another win would be to borrow an idea from from offline reinforcement learning; the replay buffer. if we mine triples we can use them to populate a replay buffer and then sample training batch from the replay buffer. the simplest approach would be to treat the buffer as a FIFO queue and expire entries based on time. more complex approaches can use the importance sampling ideas to keep examples around while they continue to add value to training. i saw huge wins by implementing Prioritised Experience Replay for my Malmomo project