Opinions on running envpool on a dedicated simulator server with e.g. REST API

harpone commented 7 months ago

Maximizing GPU utilization is usually pretty hard in RL, even with parallel environments, so I'm thinking of running a parallel simulator on a separate CPU only server with max possible number of CPUs and do the actual training on a GPU node (within same placement group + all possible networking optimizations) in whatever cloud.

Could this work or is the extra network latency too much to make this feasible?

Trinkle23897 commented 7 months ago

In that case you should use python asyncio

harpone commented 7 months ago

In that case you should use python asyncio

yeah, maybe, but I'm more concerned about the actual feasibility in terms of latency etc. I would imagine this would be more common practice if it's feasible, but haven't been able to find any references...

mavenlin commented 7 months ago

In that case you should use python asyncio

yeah, maybe, but I'm more concerned about the actual feasibility in terms of latency etc. I would imagine this would be more common practice if it's feasible, but haven't been able to find any references...

@harpone I believe this is feasible, we have a customized game implementation based on GRPC internally. But the code written at that time is no longer compatible with the current public version of envpool. And also it is async API only.

The basic idea is to initiate a GRPC server at the GPU server, and many GRPC clients at CPU cluster. The GRPC server asynchronously writes the StateBufferQueue and sends out the actions.

I can provide some help if you need this functionality and would like to implement on top of envpool.

sail-sg / envpool

Opinions on running envpool on a dedicated simulator server with e.g. REST API #300