QA generation with model is very slow

shrivastavapankajj commented 3 years ago

Hi , I have used the model in a flask app hosted on Ubuntu with 2 GB RAM . The code works but the prediction is slow and takes around 12 seconds . The same prediction takes less than a second in Colab . Please see the comparison.

on flask it takes

I am not sure why is it taking so long in Flask , Could you suggest some workaround.

patil-suraj commented 3 years ago

Hi @shrivastavapankajj

Could you tell which pipeline (single task, multi-task) and model (small, base) are you using? Thanks.

shrivastavapankajj commented 3 years ago

Hi Suraj ,

I used

from pipelines import pipeline nlp = pipeline("question-generation") text = "Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum \ and first released in 1991, Python's design philosophy emphasizes code \ readability with its notable use of significant whitespace." import time start = time.time() print("hello-NLP") print(nlp(str(text))) end = time.time() print(end - start) this is from your COLAB .

Without GPU in Colab it takes 25 sec and with GPU takes around 0.8 seconds

On a 4 GB RAM Linux server, it takes 30 seconds.

Not sure whether this response time will be acceptable? Kindly help.

Best Regards, Pankaj

On Tue, Feb 9, 2021 at 6:36 PM Suraj Patil notifications@github.com wrote:

Hi @shrivastavapankajj https://github.com/shrivastavapankajj

Could you tell which pipeline (single task, multi-task) and model (small, base) are you using? Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/patil-suraj/question_generation/issues/70#issuecomment-775921793, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIOQTG3PJFLG2PL2JS5Z2IDS6EXOHANCNFSM4XKQNETQ .

patil-suraj commented 3 years ago

The pipelines by default use small models, so if you want more speed-up I can think of the following approaches

Use the multi-task pipeline, it uses a single model to do both and extraction and qg so will save memory.
Use even smaller/distilled models. I've trained and uploaded few distilled models on the Hugging Face hub. https://huggingface.co/models?filter=distilt5-qg
Use onnx-runtime, I've added experimental onnx-runtime support. To try it use the branch onnx-support, install the required dependencies (they are in requirements.txt and requirements_ort.txt files), and just pass onnx=True to the pipeline. This should give 1.5-1.7x speed-up on the CPU. (If you want to test onnx then test it on a local machine, for some reason it's was slower than torch on google colab)

shrivastavapankajj commented 3 years ago

Thanks for the tip!

On Wed 10 Feb, 2021, 3:17 PM Suraj Patil, notifications@github.com wrote:

The pipelines by default use small models, so if you want more speed-up I can think of the following approaches

Use the multi-task pipeline, it uses a single model to do both and extraction and qg so will save memory.

Use even smaller/distilled models. I've trained and uploaded few distilled models on the Hugging Face hub. https://huggingface.co/models?filter=distilt5-qg

Use onnx-runtime, I've added experimental onnx-runtime support. To try it use the branch onnx-support, install the required dependencies (they are in requirements.txt and requirements_ort.txt files), and just pass onnx=True to the pipeline. This should give 1.5-1.7x speed-up on the CPU. (If you want to test onnx then test it on a local machine, for some reason it's was slower than torch on google colab)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/patil-suraj/question_generation/issues/70#issuecomment-776583844, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIOQTG42DR2GPIWNQZZBOS3S6JI3XANCNFSM4XKQNETQ .

patil-suraj / question_generation

QA generation with model is very slow #70