yeyupiaoling / PPASR

基于PaddlePaddle实现端到端中文语音识别,从入门到实战,超简单的入门案例,超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型
Apache License 2.0
797 stars 131 forks source link

多用户预测时:数据冲突吗? #168

Closed yourengod closed 9 months ago

yourengod commented 10 months ago

我看程序代码中的预测有用到单例: predictor = PPASRPredictor(configs=args.configs, model_path=args.model_path, use_gpu=args.use_gpu, use_pun=args.use_pun, pun_model_dir=args.pun_model_dir) 这个在多个用户访问时,模型预测会不会出现数据混乱?必经是深度模型。

yeyupiaoling commented 10 months ago

会的,

yeyupiaoling commented 10 months ago

@yourengod 可以使用多进程

yourengod commented 10 months ago

https://superfastpython.com/threadpoolexecutor-initializer/ 这里有个例子,我测了可以。

from concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED, FIRST_COMPLETED
import time
import threading

# function for initializing the worker thread
def initializer_worker():
    # get the unique name for this thread
    name = threading.current_thread().name
    # store the unique worker name in a thread local variable
    print(f'Initializing worker thread {name}')
# 参数times用来模拟网络请求的时间
def task(index):
    time.sleep(2)
    print("download video {} finished at {}\n".format(index,time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime())))
    return index
executor = ThreadPoolExecutor(max_workers=2,initializer=initializer_worker)
urls = [1, 2, 3, 4, 5]
all_task = [executor.submit(task,url) for url in urls]
wait(all_task,return_when=ALL_COMPLETED)
print("main ")
yourengod commented 10 months ago
from concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED, FIRST_COMPLETED,as_completed
import time
import threading

# function for initializing the worker thread
def initializer_worker():
    # get the unique name for this thread
    name = threading.current_thread().name
    # store the unique worker name in a thread local variable
    print(f'Initializing worker thread {name}')
# 参数times用来模拟网络请求的时间
def task(index):
    time.sleep(2)
    print("download video {} finished at {}\n".format(index,time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime())))
    return index
executor = ThreadPoolExecutor(max_workers=2,initializer=initializer_worker)
urls = [1, 2, 3, 4, 5]
future1 =executor.submit(task,1)

print(future1 .result())

future5 =executor.submit(task,5)

print(future5 .result())

print("main ")

当程序使用 Future 的 result() 方法来获取结果时,该方法会阻塞当前线程,如果没有指定 timeout 参数,当前线程将一直处于阻塞状态,直到 Future 代表的任务返回。

yeyupiaoling commented 10 months ago

我以前就是这样做的,你可以看之前的记录,但是后面删掉了。

yourengod commented 10 months ago

咋给删了啊?

yeyupiaoling commented 10 months ago

因为有很多开发者都反映他们用不了。我发现好像只有ubuntu18能够用。所以为了普遍性的话就删掉他了。

yeyupiaoling commented 10 months ago

可以通过ngnix负载均衡解决,这样更加合理。

yourengod commented 10 months ago
from time import sleep
from random import random
import threading
from concurrent.futures import ThreadPoolExecutor

# function for initializing the worker thread
def initializer_worker(locker):
    # generate a unique value for the worker thread
    locker.key = random()
    # store the unique worker key in a thread locker variable
    print(f'Initializing worker thread {locker.key}')
# a mock task that sleeps for a random amount of time less than one second
def task(locker):
    # access the unique key for the worker thread
    mykey = locker.key
    # make use of it
    print(type(mykey))
    sleep(mykey)
    return f'Worker using {mykey}'

# get the locker context
Locker = threading.local()
# create a thread pool
executor = ThreadPoolExecutor(max_workers=2, initializer=initializer_worker, initargs=(Locker,))
# dispatch asks
futures = [executor.submit(task, Locker) for _ in range(10)]
# wait for all threads to complete
for future in futures:
    result = future.result()
    print(result)
# shutdown the thread pool
executor.shutdown()
print('done')

线程池+寄存柜。

yourengod commented 10 months ago

我想起来了,python里一个进程永远只能同时执行一个线程(拿到GIL的线程才能执行),多进程全局变量不会受到影响,所以不会出现数据冲突。