seanoliver / audioflare

An all-in-one AI audio playground using Cloudflare AI Workers to transcribe, analyze, summarize, and translate any audio file.
https://audioflare.seanoliver.dev/
MIT License
395 stars 30 forks source link
ai cloudflare distilbert llama2 m2m100 openai whisper

Audioflare Logo

Audioflare

An all-in-one AI audio playground using Cloudflare AI Workers to transcribe, analyze, summarize, and translate any audio file.

View Demo · Report Bug · Request Feature

![Top Languages](https://img.shields.io/github/languages/top/seanoliver/audioflare) ![GitHub repo size](https://img.shields.io/github/repo-size/seanoliver/audioflare) ![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/seanoliver/audioflare) ![GitHub contributors](https://img.shields.io/github/contributors/seanoliver/audioflare) ![GitHub last commit](https://img.shields.io/github/last-commit/seanoliver/audioflare) ![GitHub issues](https://img.shields.io/github/issues/seanoliver/audioflare) ![GitHub](https://img.shields.io/github/license/seanoliver/audioflare)
Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. License
  6. Contact

About This Project

Audioflare emerged from my side project endeavors at Smol AI, specifically aimed at exploring the capabilities of Cloudflare AI workers. The project demonstrates a practical use case by orchestrating a series of AI workers to process an audio file of up to 30 seconds. Here’s a walkthrough of the core functionality:

  1. Transcription:

    • Initially, the audio file is transcribed using Cloudflare's Speech to Text worker, which is built on OpenAI's whisper API.
  2. Summarization:

    • The transcribed text is then summarized using Cloudflare's LLM AI worker, based on Meta's llama-2-7b-chat-int8 model. It's worth noting that the LLM model struggles with lengthy prompts.
  3. Sentiment Analysis:

    • Sentiment analysis is performed on the transcribed text using Cloudflare's Text Classification AI worker, leveraging the Huggingface’s distilbert-sst-2-int8 model.
  4. Translation:

    • The transcribed text is translated into nine languages using Cloudflare's Translation AI workers, which utilize Meta's m2m100-1.2b model.
  5. Performance Metrics:

    • The time taken for each request to be processed is calculated and disclosed, providing insight into the performance metrics.
  6. Observability and Monitoring:

    • The Cloudflare AI Gateway is used to add observability and monitoring to the AI workers, including analytics, logging, caching, and rate limiting.

The current setup has its limitations; transcription is confined to 30 seconds, and the LLM model's performance on summarization could be better.

The underlying concept of Audioflare underscores the potential of Cloudflare AI workers by standardizing the AI API request framework, simplifying multi-step AI activities. Although the models in use have limitations and are marked as 'beta' by Cloudflare, there's a clear path toward enhancing this project as more models become available.

Your engagement is encouraged. Feel free to submit pull requests and issues as you experiment with Audioflare. This project is intended to serve as a template for learning and working with Cloudflare AI workers, and while it doesn’t currently include Cloudflare's Image Classification or Text Embedding workers due to their irrelevance to the audio use case, it’s a step towards understanding and utilizing the Cloudflare AI ecosystem better.

As Cloudflare broadens its model support, I look forward to refining Audioflare, making it a more robust and informative template for the developer community.

(back to top)

Demo

Audioflare Demo

(back to top)

Key Features

(back to top)

Built With

This project was built in 2023 using the following technologies.

See package.json for a full list of dependencies.

(back to top)

Getting Started

To get a local copy up and running follow these simple steps.

  1. Clone this repository

    git clone https://github.com/seanoliver/audioflare.git
  2. Install dependencies

    cd audioflare
    bun install
  3. Create a Cloudflare account

  4. Install Wrangler and login

    bun add wrangler --dev
    wrangler login
  5. Rename .env.example to .env and follow the instructions linked in the comments to find each of the required keys and values.

  6. Run the app

    bun dev
  7. Go to http://localhost:3000 to check it out

(back to top)

Contributing

This is a great project for learning Cloudflare, AI Workers, and simple Next.js API Routes. Feel free to fork this repo and make it your own. If you have any questions or suggestions, please feel free to contact me!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Contact

Your Name - @SeanOliver - helloseanoliver@gmail.com

Project Link: https://github.com/seanoliver/audioflare

Live Demo: https://audioflare.seanoliver.dev/

(back to top)