Reimplementation in RL platform (CleanRL)

cpwan commented 1 year ago

Hello there, my team has been trying to implement the Attention Model in RL platforms so that we can try out different RL algorithms. Eventually, we succeed to implement the most efficient one with PPO in CleanRL. We are able to train the Attention Model in 3 hours for 50-nodes problems (it took 25 hours in the original code).

Moreover, we have broken down the Attention Model into several components. It would be a good resource for anyone interested in learning or developing the Attention Model.

We implemented the vehicle routing problems with the OpenAI gym interface. It may be easier to extend to other new problems.

We have released the source code for our implementation in RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research. Feel free to check it out 😆 !

wouterkool commented 1 year ago

Hi! I'm sorry I'm not watching this repo frequently, but this is great. If you create a PR I'm happy to link to this from the README (otherwise I'll see when I find the time). Before I do that, can you confirm the results you get on the same dataset? Additionally, I would also encourage to reimplement POMO (https://arxiv.org/abs/2010.16011) which is a simple but significant improvement as well.

wouterkool commented 5 months ago

Hi! Thanks again for your implementation. I have linked to it in the README.

wouterkool / attention-learn-to-route

Reimplementation in RL platform (CleanRL) #53