Model providers for tutorials

hugobowne commented 1 month ago

Currently we're using

Text Generation Models

"gpt-3.5-turbo"
"gpt-4o-mini"
"gpt-4o" These are OpenAI language models used for generating text responses.

Text-to-Audio Model

"suno/bark" (Hugging Face model)
This is used for converting text to audio.

Text-to-Image Model

"stability-ai/stable-diffusion-3" (Replicate model) This is used for generating images from text prompts.

Text-to-Video Model

"deforum/deforum_stable_diffusion" (Replicate model) This is used for generating videos from text prompts.

Audio Transcription Model

"whisper-1" (OpenAI model) This is used for transcribing audio to text.

Now this is all fine (and fun!) but I want to be able to teach this to students without them needing to give credit card details to vendors!

Relipcate has very kindly offered $100 of credits for students woohooo!

And we can do most of this using models on Replicate:

I'd like to use a ChatGPT, Clause, Mistral, or Gemini LLM and Gemini has a free tier so let's maybe use that! It would be cool to show several LLMs and we should definitely figure out a way to teach such things :)

hugobowne commented 1 month ago

we could also use an LM from Replicate but I do want people to get experience with combining models from different providers.

emattia commented 1 month ago

https://github.com/hugobowne/first-multimodal-genAI-app/pull/6

^ has options for

REPLICATE_IMAGE_MODEL_ID_LS = [
     "black-forest-labs/flux-dev",
    "stability-ai/stable-diffusion-3",
]
REPLICATE_VIDEO_MODEL_ID_LS = [
    "lucataco/hotshot-xl:78b3a6257e16e4b241245d65c8b2b81ea2e1ff7ed4c55306b511509ddbfd327a",
    "deforum/deforum_stable_diffusion:e22e77495f2fb83c34d5fae2ad8ab63c0a87b6b573b6208e1535b23b89ea66d6",
]

hugobowne commented 1 month ago

and today we also did these!

https://github.com/hugobowne/first-multimodal-genAI-app/pull/15 https://github.com/hugobowne/first-multimodal-genAI-app/pull/18

hugobowne commented 1 month ago

oh gah @emattia i thought we removed all models that need a credit card but looks like we still have a hugging face model?

async def text_to_audio(text: str, src: str) -> bytes:
    st.session_state["running_audio_job"] = True
    st.session_state[f"{src}_audio_bytes"] = None

    print(f"[DEBUG] Generating audio...")
    t0 = time.time()
    async with aiohttp.ClientSession() as session:
        async with session.post(
            HF_BARK_ENDPOINT, headers=bark_api_headers, json={"inputs": text}
        ) as response:
            tf = time.time()
            print(f"[DEBUG] text_to_audio request took {tf - t0:.2f} seconds")
            if response.status == 200:
                out_dir = os.path.join(AUDIO_DATA_SINK, src)
                if not os.path.exists(out_dir):
                    os.makedirs(out_dir)
                out_path = os.path.join(out_dir, f"{uuid.uuid4().hex}_audio.wav")
                with open(out_path, "wb") as f:
                    f.write(await response.read())
                data = {
                    "text": text,
                    "date": pd.Timestamp.now(),
                    "model": "suno/bark",
                    "provider": "Hugging Face",
                    "client_time": tf - t0,
                }
                df = pd.DataFrame(data, index=[0])
                st.session_state["audio_gen_evals_df"] = pd.concat(
                    [st.session_state["audio_gen_evals_df"], df], ignore_index=True
                )
                if src == 'user':
                    st.session_state["user_audio_bytes"] = await response.read()
                elif src == 'llm':
                    st.session_state["llm_audio_bytes"] = await response.read()
                st.session_state["running_audio_job"] = False
            else:
                st.session_state["running_audio_job"] = False
                raise Exception(
                    f"Request failed with status code {response.status}: {await response.text()}"
                )

can we switch this out for a replicate model? (e.g. bark) or is this free on HF?

hugobowne commented 1 month ago

yeah could use this? https://replicate.com/suno-ai/bark

hugobowne commented 1 month ago

done here https://github.com/hugobowne/first-multimodal-genAI-app/pull/21

hugobowne / first-multimodal-genAI-app