Closed hugobowne closed 1 month ago
we could also use an LM from Replicate but I do want people to get experience with combining models from different providers.
https://github.com/hugobowne/first-multimodal-genAI-app/pull/6
^ has options for
REPLICATE_IMAGE_MODEL_ID_LS = [
"black-forest-labs/flux-dev",
"stability-ai/stable-diffusion-3",
]
REPLICATE_VIDEO_MODEL_ID_LS = [
"lucataco/hotshot-xl:78b3a6257e16e4b241245d65c8b2b81ea2e1ff7ed4c55306b511509ddbfd327a",
"deforum/deforum_stable_diffusion:e22e77495f2fb83c34d5fae2ad8ab63c0a87b6b573b6208e1535b23b89ea66d6",
]
oh gah @emattia i thought we removed all models that need a credit card but looks like we still have a hugging face model?
async def text_to_audio(text: str, src: str) -> bytes:
st.session_state["running_audio_job"] = True
st.session_state[f"{src}_audio_bytes"] = None
print(f"[DEBUG] Generating audio...")
t0 = time.time()
async with aiohttp.ClientSession() as session:
async with session.post(
HF_BARK_ENDPOINT, headers=bark_api_headers, json={"inputs": text}
) as response:
tf = time.time()
print(f"[DEBUG] text_to_audio request took {tf - t0:.2f} seconds")
if response.status == 200:
out_dir = os.path.join(AUDIO_DATA_SINK, src)
if not os.path.exists(out_dir):
os.makedirs(out_dir)
out_path = os.path.join(out_dir, f"{uuid.uuid4().hex}_audio.wav")
with open(out_path, "wb") as f:
f.write(await response.read())
data = {
"text": text,
"date": pd.Timestamp.now(),
"model": "suno/bark",
"provider": "Hugging Face",
"client_time": tf - t0,
}
df = pd.DataFrame(data, index=[0])
st.session_state["audio_gen_evals_df"] = pd.concat(
[st.session_state["audio_gen_evals_df"], df], ignore_index=True
)
if src == 'user':
st.session_state["user_audio_bytes"] = await response.read()
elif src == 'llm':
st.session_state["llm_audio_bytes"] = await response.read()
st.session_state["running_audio_job"] = False
else:
st.session_state["running_audio_job"] = False
raise Exception(
f"Request failed with status code {response.status}: {await response.text()}"
)
can we switch this out for a replicate model? (e.g. bark) or is this free on HF?
yeah could use this? https://replicate.com/suno-ai/bark
Currently we're using
Text Generation Models
Text-to-Audio Model
Text-to-Image Model
"stability-ai/stable-diffusion-3" (Replicate model) This is used for generating images from text prompts.
Text-to-Video Model
"deforum/deforum_stable_diffusion" (Replicate model) This is used for generating videos from text prompts.
Audio Transcription Model
"whisper-1" (OpenAI model) This is used for transcribing audio to text.
Now this is all fine (and fun!) but I want to be able to teach this to students without them needing to give credit card details to vendors!
Relipcate has very kindly offered $100 of credits for students woohooo!
And we can do most of this using models on Replicate:
I'd like to use a ChatGPT, Clause, Mistral, or Gemini LLM and Gemini has a free tier so let's maybe use that! It would be cool to show several LLMs and we should definitely figure out a way to teach such things :)