Open zhaotyer opened 6 months ago
I meet the similar case. Here is my code:
def worker(rank, this_model):
try:
if this_model is None:
client = mii.client('qwen')
else:
client = this_model
response = client.generate(["xxx"], max_new_tokens=1024, stop="<|im_end|>", do_sample=False, return_full_text=True)
print("in worker rank:", rank, " response:", response)
except Exception as e:
print(f"Capture error:{e}")
finally:
print("final")
model = mii.serve(model_dir, deployment_name="qwen", tensor_parallel=xx, replica_num=replica_num)
job_process = []
for rank in range(0, replica_num):
if rank == 0:
job_process.append(threading.Thread(target=worker,args=(rank,model,)))
else:
job_process.append(threading.Thread(target=worker,args=(rank,None,)))
for process in job_process:
process.start()
for process in job_process:
process.join()
When using threading.Thread
, it works well. However, it will be blocked in client.generate
if using multiprocessing.Process
.
I meet the similar case. Here is my code:
def worker(rank, this_model): try: if this_model is None: client = mii.client('qwen') else: client = this_model response = client.generate(["xxx"], max_new_tokens=1024, stop="<|im_end|>", do_sample=False, return_full_text=True) print("in worker rank:", rank, " response:", response) except Exception as e: print(f"Capture error:{e}") finally: print("final") model = mii.serve(model_dir, deployment_name="qwen", tensor_parallel=xx, replica_num=replica_num) job_process = [] for rank in range(0, replica_num): if rank == 0: job_process.append(threading.Thread(target=worker,args=(rank,model,))) else: job_process.append(threading.Thread(target=worker,args=(rank,None,))) for process in job_process: process.start() for process in job_process: process.join()
When using
threading.Thread
, it works well. However, it will be blocked inclient.generate
if usingmultiprocessing.Process
.
Since the threading.Thread
is fake in python due to GIL
, this code can not make full use of concurrency. It means that I still need multiprocessing.Process
to start a new client. However, it does not work well mentioned above.
I meet the similar case. Here is my code:
def worker(rank, this_model): try: if this_model is None: client = mii.client('qwen') else: client = this_model response = client.generate(["xxx"], max_new_tokens=1024, stop="<|im_end|>", do_sample=False, return_full_text=True) print("in worker rank:", rank, " response:", response) except Exception as e: print(f"Capture error:{e}") finally: print("final") model = mii.serve(model_dir, deployment_name="qwen", tensor_parallel=xx, replica_num=replica_num) job_process = [] for rank in range(0, replica_num): if rank == 0: job_process.append(threading.Thread(target=worker,args=(rank,model,))) else: job_process.append(threading.Thread(target=worker,args=(rank,None,))) for process in job_process: process.start() for process in job_process: process.join()
When using
threading.Thread
, it works well. However, it will be blocked inclient.generate
if usingmultiprocessing.Process
.Since the
threading.Thread
is fake in python due toGIL
, this code can not make full use of concurrency. It means that I still needmultiprocessing.Process
to start a new client. However, it does not work well mentioned above.
I find the official example. Maybe we should start the server and clients like these ways.
I tried to integrate mii into tritonserver, but encountered some problems Below is part of my code
the error is: when i use
mii block at
when i use
Able to infer normally, but grpc keeps reporting errors(Does not affect inference but the service is not stable) https://github.com/grpc/grpc/issues/25364