how could I support concurrency upto 50 a seconds?

Prerequisites

Please fill in by replacing [ ] with [x].

[yes ] Are you running the latest bert-as-service?
[yes ] Did you follow the installation and the usage instructions in README.md?
[yes ] Did you check the FAQ list in README.md?
[yes ] Did you perform a cursory search on existing issues?

System information

Some of this information can be collected via this script.

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):windows10
TensorFlow installed from (source or binary):aliyun
TensorFlow version:tensorflow-gpu==1.12.0
Python version:python3.5.4
bert-as-service version: 1.9.1
GPU model and memory:GTX 1060
CPU model and memory:i7 7700k

Description

Please replace YOUR_SERVER_ARGS and YOUR_CLIENT_ARGS accordingly. You can also write your own description for reproducing the issue.

I'm using this command to start the server:

bert-serving-start -model_dir D:\NLU\chinese_L-12_H-768_A-12 -tuned_model_dir D:\NLU\rasa_model_output -ckpt_name=model.ckpt-597

and calling the server via:

all_tokens = []

        for msg in message:
            msg_tokens = []
            for t in msg.get("tokens"):
                text = self._replace_number_blank(t.text)
                if text != '':
                    msg_tokens.append(text)
            a = str(msg_tokens)
            a = a.replace('[', '')
            a = a.replace(']', '')
            a = a.replace(',', '')
            a = a.replace('\'', '')
            a = a.replace(' ', '')
            all_tokens.append(list(a))
            #all_tokens.append(a)

        logger.info("bert vectors featurizer finished")

        try:

            bert_embedding = self.bc.encode(all_tokens, is_tokenized=True)

            bert_embedding = np.squeeze(bert_embedding)

Then this issue shows up:

I want to increase the concurrency performance in the production environments, and in production , the user input one sentence at once . and I used jmeter to test the concurrency is at about 10 per second and when it up to 20 per second

here:

bert_embedding = self.bc.encode(all_tokens, is_tokenized=True)

will block and cost a lot of time

how could I improve the concurrency up to 50 a second?

should I use this parameter? in the server side config? -http_max_connect 50

Thanks weizhen

...

jina-ai / clip-as-service

how could I support concurrency upto 50 a seconds? #386

Description