souzatharsis / podcastfy

An Open Source alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI
https://www.podcastfy.ai
Other
614 stars 66 forks source link

Feature Request: Support Docker and OpenAI-compatible API #64

Open taowang1993 opened 5 days ago

taowang1993 commented 5 days ago

Hi, Podcastfy Team.

I would like to make a feature request for easier deployment and integration with other systems.

For example: In Dify (an agent building platform like Langchain), currently I can generate a script and convert it into audio with tts models by clicking the play button.

I would like to integrate Podcastfy with Dify so that I can generate podcasts.

image

This will open up many opportunities.

For example, I can build an AI teacher that helps people learn new languages.

Podcastfy can be used to help students improve English listening comprehension.

As you can see in the screenshot, Dify can call any OpenAI-compatible APIs.

Thank you.

souzatharsis commented 5 days ago

Re: OpenAI-compatible API

Excellent Feature Request; this is a must for wide adoption. Could you please clarify whether for your use case to work should we implement OpenAI TTS interface or Chat interface?

Would appreciate if you could share the specific API spec you are referring to and we should be able to push it pretty quickly.

Thanks for sharing your detailed use case with us.

taowang1993 commented 5 days ago

OpenAI TTS interface or Chat interface

I think Podcastfy can go with the tts API format: https://api.openai.com/v1/audio/speech

python

from pathlib import Path
import openai

speech_file_path = Path(__file__).parent / "speech.mp3"
response = openai.audio.speech.create(
  model="tts-1",
  voice="alloy",
  input="The quick brown fox jumped over the lazy dog."
)
response.stream_to_file(speech_file_path)

curl

curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy"
  }' \
  --output speech.mp3

node

import fs from "fs";
import path from "path";
import OpenAI from "openai";

const openai = new OpenAI();

const speechFile = path.resolve("./speech.mp3");

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: "tts-1",
    voice: "alloy",
    input: "Today is a wonderful day to build something people love!",
  });
  console.log(speechFile);
  const buffer = Buffer.from(await mp3.arrayBuffer());
  await fs.promises.writeFile(speechFile, buffer);
}
main();

OpenAI TTS API Reference: https://platform.openai.com/docs/api-reference/audio/createSpeech

ralyodio commented 4 days ago

we should support ollama for starters.

souzatharsis commented 4 days ago

@ralyodio podcastfy now supports running local llms via llamafiles https://github.com/souzatharsis/podcastfy/blob/main/usage/local_llm.md

What would be the value add of adding ollama given that?

We can move the ollama discussion to a separate issue if there's value in it. And keep this issue focused on OpenAI interface request.

Curious about your experience.

ralyodio commented 4 days ago

a lot of people already have ollama running and it exposes a rest api so apps (like this one) can easily integrate with it.

brumar commented 2 days ago

@taowang1993 the input in your example would the the raw content right? Not the transcript? As I think exposing a rest API is out of scope for the moment (but @souzatharsis can prove me wrong), the idea is that if we have an API in python that follows the same signature of openai.audio.speech.create then it would be trial to expose it as a rest API right?

I feel this api would be a bit awkward as it focus only on a selection of arguments that podcastfy normally uses. It's technically feasible (as an instance method of a new class or even as a closure) and it would mesh very well with projects that want to expose this as a rest API, but won't be too useful for the integration of podcastfy in a larger python projects, and would be something we have to maintain over time.

I think the documentation could present a recipe to create a fastapi endpoint that would more or less respect the openai openapi.json for example, with a real code snippet leveraging the current abstractions, so that would be even more helpful for the kind of need you have, while not adding a new interface to maintain in the codebase. That's my 2cts anyway :)

taowang1993 commented 2 days ago

the input in your example would the the raw content right? Not the transcript?

In the context of podcastfy, the "input" would be the raw text (or document) that will be fed to podcastfy to convert into podcasts.

In the context of openai tts, the "input" is the transcript that users want to convert into speech.

The reason I propose an "openai-compatible" appoach, is because many ai systems already have the openai client sdk built in.

If podcastfy exposes an openai-compatible api, then it would be very easy to integrate into other systems, leading to wider adoption.

If openai api format is not very suitable for podcastfy, then any api format will also work.

brumar commented 1 day ago

Thanks for the explanations. That's convincing!