Audioflare

An all-in-one AI audio playground using Cloudflare AI Workers to transcribe, analyze, summarize, and translate any audio file.

View Demo · Report Bug · Request Feature

![Top Languages](https://img.shields.io/github/languages/top/seanoliver/audioflare) ![GitHub repo size](https://img.shields.io/github/repo-size/seanoliver/audioflare) ![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/seanoliver/audioflare) ![GitHub contributors](https://img.shields.io/github/contributors/seanoliver/audioflare) ![GitHub last commit](https://img.shields.io/github/last-commit/seanoliver/audioflare) ![GitHub issues](https://img.shields.io/github/issues/seanoliver/audioflare) ![GitHub](https://img.shields.io/github/license/seanoliver/audioflare)

Table of Contents

About The Project
Getting Started
Usage
Contributing
License
Contact

About This Project

Audioflare emerged from my side project endeavors at Smol AI, specifically aimed at exploring the capabilities of Cloudflare AI workers. The project demonstrates a practical use case by orchestrating a series of AI workers to process an audio file of up to 30 seconds. Here’s a walkthrough of the core functionality:

Transcription:
- Initially, the audio file is transcribed using Cloudflare's Speech to Text worker, which is built on OpenAI's whisper API.
Summarization:
- The transcribed text is then summarized using Cloudflare's LLM AI worker, based on Meta's llama-2-7b-chat-int8 model. It's worth noting that the LLM model struggles with lengthy prompts.
Sentiment Analysis:
- Sentiment analysis is performed on the transcribed text using Cloudflare's Text Classification AI worker, leveraging the Huggingface’s distilbert-sst-2-int8 model.
Translation:
- The transcribed text is translated into nine languages using Cloudflare's Translation AI workers, which utilize Meta's m2m100-1.2b model.
Performance Metrics:
- The time taken for each request to be processed is calculated and disclosed, providing insight into the performance metrics.
Observability and Monitoring:
- The Cloudflare AI Gateway is used to add observability and monitoring to the AI workers, including analytics, logging, caching, and rate limiting.

The current setup has its limitations; transcription is confined to 30 seconds, and the LLM model's performance on summarization could be better.

The underlying concept of Audioflare underscores the potential of Cloudflare AI workers by standardizing the AI API request framework, simplifying multi-step AI activities. Although the models in use have limitations and are marked as 'beta' by Cloudflare, there's a clear path toward enhancing this project as more models become available.

Your engagement is encouraged. Feel free to submit pull requests and issues as you experiment with Audioflare. This project is intended to serve as a template for learning and working with Cloudflare AI workers, and while it doesn’t currently include Cloudflare's Image Classification or Text Embedding workers due to their irrelevance to the audio use case, it’s a step towards understanding and utilizing the Cloudflare AI ecosystem better.

As Cloudflare broadens its model support, I look forward to refining Audioflare, making it a more robust and informative template for the developer community.

(back to top)

Demo

Audioflare Demo

(back to top)

Key Features

Audio Processing:
- Users can upload an audio file for processing.
  - Drag and drop a local audio file from their computer.
  - Alternatively, drag and drop one of three pre-provided audio files included on the main page and in this repo.
- Audio files longer than 30 seconds are supported, but only the first 30 seconds will be transcribed.
- Audio transcription is handled by Cloudflare's Speech to Text worker (based on OpenAI's Whisper API).
Text Summarization:
- Transcribed text is summarized using Cloudflare's LLM AI worker (based on Meta's llama-2-7b-chat-int8 model).
Sentiment Analysis:
- Sentiment analysis is performed on the transcribed text using Cloudflare's Text Classification AI worker (based on Huggingface’s distilbert-sst-2-int8 model).
Translation:
- Transcribed text is translated into nine different languages using Cloudflare's Translation AI workers (based on Meta's m2m100-1.2b model).
Performance Metrics:
- Time taken for each request to be processed is calculated and displayed.
Observability and Monitoring:
- Uses Cloudflare AI Gateway to add observability and monitoring to the AI workers:
  - Analytics: View metrics like the number of requests and tokens.
  - Logging: Monitor requests and errors.
  - Caching: Serve requests from Cloudflare’s cache for faster response and cost savings.
  - Rate Limiting: Control application scaling by limiting the number of received requests.
Learning and Exploration:
- Audioflare serves as a template for learning and working with Cloudflare AI workers.
- Users can explore the functionality of different Cloudflare AI workers excluding the Image Classification or Text Embedding workers as they are not integrated due to their irrelevance to the audio use case.

(back to top)

Built With

This project was built in 2023 using the following technologies.

See package.json for a full list of dependencies.

(back to top)

Getting Started

To get a local copy up and running follow these simple steps.

Clone this repository

git clone https://github.com/seanoliver/audioflare.git

Install dependencies
```
cd audioflare
bun install
```
Create a Cloudflare account
Install Wrangler and login
```
bun add wrangler --dev
wrangler login
```
Rename .env.example to .env and follow the instructions linked in the comments to find each of the required keys and values.
Run the app
```
bun dev
```
Go to http://localhost:3000 to check it out

(back to top)

Contributing

This is a great project for learning Cloudflare, AI Workers, and simple Next.js API Routes. Feel free to fork this repo and make it your own. If you have any questions or suggestions, please feel free to contact me!