Closed Mshz2 closed 1 year ago
Hi @Mshz2
WarpDrive is mostly designed for running multi-agent reinforcement learning, i.e., each gpu thread can work in parallel for one thread. In this sense, a single agent environment may not be the best suit, as in your case. Although WarpDrive is able to run single agent, it is not optimal for single agent as we designed. We are going to include a single agent adaptor later though. Beside, your 4-D gridsize seems extremely large, roughly ~1G per environment, if you want to run many replicas in parallel, the memory would be a serious constraint right here.
Regards,
Hi @Mshz2
WarpDrive is mostly designed for running multi-agent reinforcement learning, i.e., each gpu thread can work in parallel for one thread. In this sense, a single agent environment may not be the best suit, as in your case. Although WarpDrive is able to run single agent, it is not optimal for single agent as we designed. We are going to include a single agent adaptor later though. Beside, your 4-D gridsize seems extremely large, roughly ~1G per environment, if you want to run many replicas in parallel, the memory would be a serious constraint right here.
Regards,
Thanks for your response. There can be also a case for four 1D game. How about assigning an agent per dimension and agents take action one after another? Do you think would it reduce the load? for my case one environment is sufficient.
I believe that would be much better if you can use 4 agents to take care of each dimension individually.
Dear all, I am new to reinforcement learning, but I am fascinated with the Warp Drive. I was wondering if you could help me to build up my custom env for my little study project. The story of my env is like: I wanna create a gym 4D environment, where it is a 468x225x182x54 plane (which means 1,034,888,400 unique cells). And every cell in this space has a unique value. And my agent (e.g. rabbit) can jump anywhere in this space and makes cells get zero value (or burned after the point of the cell gets collected by the rabbit). Also the agent will be rewarded based on reduction of the environment overall points (e.g. 2000) from the change of cell values to zero. Which cells have more points or reward is unknown to agent but fixed, and it is the task of the agent to find out by making jump in order to burn more higher value cells before game episode length finish. I thought my action space could be defined as
For example
where my agent collects the reward of the location [172 54 101 37]. And all values at this cell is zero now. When the game starts the agent would jump to this 4D space (I assume it is better to make my first episode start at a fixed position but buffer action(no values are zeroed at this first episode) and during policy training agent learns to begin with an action that makes a globally better reward). Furthermore, I want the step function for episodes of the game be like a rabbit make a jump, then the reward is returned. Also, the returned state of the episode is the 4D space with same shape but the value of it will change from zeroing of previous action.
However, I don't know how should I define my observation space and I really appreciate your help.
So far, for example if I modify your gridword example env: