Closed jon-chuang closed 3 years ago
I could see the need for this too.
I was able to create a bootleg version of memory. I'm pickling the state and storing it by reference inside of an episode. I can then reconstruct the memory by selecting which memories. I have no idea if this idea will be tracked further by the RLLib team, though I'm an advocate for crafting your own solution.
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel.
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you'd still like it to be addressed.
Again, you can always ask for help on our discussion forum or Ray's public slack channel.
Thanks again for opening the issue!
We should add support for various memory modules: MERLIN - buffer of memory encodings through predictive coding and retroactive updating (https://arxiv.org/pdf/1803.10760.pdf) SPTM - topological memory (graph) of vision embeddings for optimal path planning and localisation (https://arxiv.org/pdf/1803.00653.pdf) Episodic curiosity through reachability (siamese network reachability metric) (https://arxiv.org/pdf/1810.02274.pdf) Relational Memory Core - fixed-sized buffer with a recurrent transformer forward pass (https://arxiv.org/pdf/1806.01822.pdf, https://arxiv.org/pdf/1806.01830.pdf) (Relation to Capsules, transformer-XL? Difference from RNN: Transformer architecture allows arbitrary buffer sizes, but hard to tell just how many it scales to from papers)
These are all Deepmind papers :) They seem to think that memory is a vital component in future agents. IMHO, I think we should too :)
Neural Map - cheats with ground truth agent position, though this can be approximated with dead reckoning/other SLAM techniques to correct for error. However, I prefer the topological memory approach, which circumvents this. (https://arxiv.org/pdf/1702.08360.pdf)
I would be interested to see comparisons of performance in various navigation environments.