Open shrivastavapankajj opened 3 years ago
Hi @shrivastavapankajj
Could you tell which pipeline (single task, multi-task) and model (small, base) are you using? Thanks.
Hi Suraj ,
I used
from pipelines import pipeline nlp = pipeline("question-generation") text = "Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum \ and first released in 1991, Python's design philosophy emphasizes code \ readability with its notable use of significant whitespace." import time start = time.time() print("hello-NLP") print(nlp(str(text))) end = time.time() print(end - start) this is from your COLAB .
Without GPU in Colab it takes 25 sec and with GPU takes around 0.8 seconds
On a 4 GB RAM Linux server, it takes 30 seconds.
Not sure whether this response time will be acceptable? Kindly help.
Best Regards, Pankaj
On Tue, Feb 9, 2021 at 6:36 PM Suraj Patil notifications@github.com wrote:
Hi @shrivastavapankajj https://github.com/shrivastavapankajj
Could you tell which pipeline (single task, multi-task) and model (small, base) are you using? Thanks.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/patil-suraj/question_generation/issues/70#issuecomment-775921793, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIOQTG3PJFLG2PL2JS5Z2IDS6EXOHANCNFSM4XKQNETQ .
The pipelines by default use small models, so if you want more speed-up I can think of the following approaches
onnx-runtime
, I've added experimental onnx-runtime
support. To try it use the branch onnx-support
, install the required dependencies (they are in requirements.txt
and requirements_ort.txt
files), and just pass onnx=True
to the pipeline. This should give 1.5-1.7x speed-up on the CPU.
(If you want to test onnx
then test it on a local machine, for some reason it's was slower than torch
on google colab)Thanks for the tip!
On Wed 10 Feb, 2021, 3:17 PM Suraj Patil, notifications@github.com wrote:
The pipelines by default use small models, so if you want more speed-up I can think of the following approaches
- Use the multi-task pipeline, it uses a single model to do both and extraction and qg so will save memory.
- Use even smaller/distilled models. I've trained and uploaded few distilled models on the Hugging Face hub. https://huggingface.co/models?filter=distilt5-qg
- Use onnx-runtime, I've added experimental onnx-runtime support. To try it use the branch onnx-support, install the required dependencies (they are in requirements.txt and requirements_ort.txt files), and just pass onnx=True to the pipeline. This should give 1.5-1.7x speed-up on the CPU. (If you want to test onnx then test it on a local machine, for some reason it's was slower than torch on google colab)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/patil-suraj/question_generation/issues/70#issuecomment-776583844, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIOQTG42DR2GPIWNQZZBOS3S6JI3XANCNFSM4XKQNETQ .
Hi , I have used the model in a flask app hosted on Ubuntu with 2 GB RAM . The code works but the prediction is slow and takes around 12 seconds . The same prediction takes less than a second in Colab . Please see the comparison.
on flask it takes
I am not sure why is it taking so long in Flask , Could you suggest some workaround.