Whisper is an automatic State-of-the-Art speech recognition system from OpenAI that has been trained on 680,000 hours of multilingual and multitask supervised data collected from the web. This large and diverse dataset leads to improved robustness to accents, background noise and technical language. In addition, it enables transcription in multiple languages, as well as translation from those languages into English. OpenAI released the models and code to serve as a foundation for building useful applications that leverage speech recognition.
whisper-api
In the Dockerfile we will add the following lines:
FROM python:3.10-slim
WORKDIR /python-docker
COPY requirements.txt requirements.txt
RUN apt-get update && apt-get install git -y
RUN pip3 install -r requirements.txt
RUN pip3 install "git+https://github.com/openai/whisper.git"
RUN apt-get install -y ffmpeg
COPY . .
EXPOSE 5000
CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0"]
python-docker
from flask import Flask, abort, request
from tempfile import NamedTemporaryFile
import whisper
import torch
# Check if NVIDIA GPU is available
torch.cuda.is_available()
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
# Load the Whisper model:
model = whisper.load_model("base", device=DEVICE)
app = Flask(__name__)
@app.route("/")
def hello():
return "Whisper Hello World!"
@app.route('/whisper', methods=['POST'])
def handler():
if not request.files:
# If the user didn't submit any files, return a 400 (Bad Request) error.
abort(400)
# For each file, let's store the results in a list of dictionaries.
results = []
# Loop over every file that the user submitted.
for filename, handle in request.files.items():
# Create a temporary file.
# The location of the temporary file is available in `temp.name`.
temp = NamedTemporaryFile()
# Write the user's uploaded file to the temporary file.
# The file will get deleted when it drops out of scope.
handle.save(temp)
# Let's get the transcript of the temporary file.
result = model.transcribe(temp.name)
# Now we can store the result object for this file.
results.append({
'filename': filename,
'transcript': result['text'],
})
# This will be automatically converted to JSON.
return {'results': results}
docker build -t whisper-api .
docker run -p 5000:5000 whisper-api
If you are having errors on MacOS please add RUN pip3 install markupsafe==2.0.1
to the dockerfile.
cd /tmp
git clone https://github.com/lablab-ai/whisper-api-flask whisper
cd whisper
mv Dockerfile Containerfile
podman build --network="host" -t whisper .
podman run --network="host" -p 5000:5000 whisper
Then run:
curl -F "file=@/path/to/filename.mp3" http://localhost:5000/whisper
Also, from the README:
In result you should get a JSON object with the transcript in it.
http://localhost:5000/whisper
with a file in it. Body should be form-data.curl -F "file=@/path/to/file" http://localhost:5000/whisper
This API can be deployed anywhere where Docker can be used. Just keep in mind that this setup currently using CPU for processing the audio files. If you want to use GPU you need to change Dockerfile and share the GPU. I won't go into this deeper as this is an introduction. Docker GPU
You can find the whole code [here]()
Thank you for reading! If you enjoyed this tutorial you can find more and continue reading on our tutorial page
On lablab discord, we discuss this repo and many other topics related to artificial intelligence! Checkout upcoming Artificial Intelligence Hackathons Event