jina-ai / example-speech-to-image

An example of building a speech to image generation pipeline with Jina, Whisper and StableDiffusion
21 stars 7 forks source link

speech2image

Create realistic AI generated images from human voice

Leveraging open ai whisper and StableDiffusion in a cloud native application powered by Jina

Under the hood the whisper and stable diffusion models are wrapped into Executors that will make them self-contained microservices. Both of the microservices will be chained into a Flow. The Flow expose a gRPC endpoint which accept DocumentArray as input.

This is an example of a multi-modal application that can be built with jina

How to use it ?

pip install -r requirements.txt
pip install -r executors/stablediffusion/requirements.txt
pip install -r executors/whisper/requirements.txt
JINA_MP_START_METHOD=spawn HF_TOKEN=YOUR_HF_TOKEN python flow.py
pip install jcloud
jc login
jc deploy flow.yml
python ui.py

or if you started the flow in Jcloud you can do

python ui.py --host grpcs://FLOW_ID.wolf.jina.ai
from jina import Client
from docarray import Document
client = Client(host='localhost:54322') 
docs = client.post('/', inputs=[Document(uri='audio.wav') for _ in range(1)])
for img in docs[0].matches:
    img.load_uri_to_image_tensor()

docs[0].matches.plot_image_sprites()