hpcaitech / EnergonAI

Large-scale model inference.
Apache License 2.0
630 stars 90 forks source link

trpc.rpc_sync consumed most time #175

Open fanlongbd opened 1 year ago

fanlongbd commented 1 year ago

We used VIT model in examples and test performance.

  1. Test env: V100-32G,Batch size=128
  2. We found 1 GPU or TP=2, we breakdown the time from received request to return result and rpc_sync from worker to master cost 95% of whole process. pMVQAmlKWn First orange: master send data to worker (13 ms Second orange: woker send result to master ( ~423 ms)

the 423ms is from before pipe.py/def send /trpc.rpc_sync(self.dest, rpc_queue_put, args=(self.remote_queue, data)) to first line in def rpc_queue_put(q: trpc.RRef, data: Any). just trpc.rpc_sync