git clone https://github.com/rmusser01/tldw
-> cd tldw
-> python -m venv .\
-> . .\scripts\activate.ps1
-> pip install -r requirements.txt
-> python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s -api openai -k tag_one tag_two tag_three
python summarize.py -gui
python summarize.py -gui --local_llama
(will ask you questions about which model to download)For commercial API usage for use with this project: Claude Sonnet 3.5, Cohere Command R+, DeepSeek. Flipside I would say none honestly. The(the largest players) will gaslight you and charge you money for it. Fun. From @nrose 05/08/2024 on Threads:
No, it’s a design. First they train it, then they optimize it. Optimize it for what- better answers? No. For efficiency.
Per watt. Because they need all the compute they can get to train the next model.So it’s a sawtooth.
The model declines over time, then the optimization makes it somewhat better, then in a sort of reverse asymptote,
they dedicate all their “good compute” to the next bigger model.Which they then trim down over time, so they can train
the next big model… etc etc.
None of these companies exist to provide AI services in 2024. They’re only doing it to finance the things they want to
build in 2025 and 2026 and so on, and the goal is to obsolete computing in general and become a hidden monopoly like
the oil and electric companies.
2024 service quality is not a metric they want to optimize, they’re forced to, only to maintain some directional income
For offline LLM usage, I recommend the following fine-tuned Mistral-Instruct v0.2 model:
Alternatively, there is https://huggingface.co/microsoft/Phi-3-mini-4k-instruct, which you can get in a GGUF format from here: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf
CLI Screenshot
GUI Screenshot
git clone https://github.com/rmusser01/tldw
or manually download it (Green code button, upper right corner -> Download ZIP) and extract it to a folder of your choice.python -m venv .\
. .\scripts\activate.ps1
py -m pip install --upgrade pip wheel
& pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
pip install torch-directml
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
- may take a bit of time...Transcribe audio from a Youtube URL:
python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s
Transcribe audio from a Youtube URL & Summarize it using (anthropic
/cohere
/openai
/llama
(llama.cpp)/ooba
(oobabooga/text-gen-webui)/kobold
(kobold.cpp)/tabby
(Tabbyapi)) API:
python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s -api <your choice of API>
config.txt
under the appropriate API variableTranscribe a list of Youtube URLs & Summarize them using (anthropic
/cohere
/openai
/llama
(llama.cpp)/ooba
(oobabooga/text-gen-webui)/kobold
(kobold.cpp)/tabby
(Tabbyapi)) API:
python summarize.py ./ListofVideos.txt -api <your choice of API>
config.txt
under the appropriate API variableTranscribe & Summarize a List of Videos on your local filesytem with a text file:
python summarize.py -v ./local/file_on_your/system
Download a Video with Audio from a URL:
python summarize.py -v https://www.youtube.com/watch?v=4nd1CDZP21s
sPerform a summarization of a longer transcript using 'Chunking'
python summarize.py -roll -detail 0.01 https://www.youtube.com/watch?v=4nd1CDZP21s
0.01
to 1.00
, increments at a measure of .01
.Run it as a WebApp
python summarize.py -gui
- This requires you to either stuff your API keys into the config.txt
file, or pass them into the app every time you want to use it.Requirements
pip install torch-directml
Linux
sudo apt install ffmpeg / dnf install ffmpeg
, Update your GPU Drivers/CUDA drivers if you'll be running an LLM locally)git clone https://github.com/rmusser01/tldw
cd tldw
python -m venv ./
source ./bin/activate
py -m pip install --upgrade pip wheel
& pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cpu
Linux && Windows
Windows
git clone https://github.com/rmusser01/tldw
cd tldw
python -m venv ./
. .\scripts\activate.ps1
or for CMD: .\scripts\activate.bat
py -m pip install --upgrade pip wheel
& pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cpu
pip install torch-directml
Linux && Windows
Linux && Windows
pip install -r requirements.txt
- may take a bit of time...config.txt
file.
./bin/activate
or .\scripts\activate.ps1
or .\scripts\activate.bat
from the tldw
directory)python ./summarize.py <video_url>
- The video URL does not have to be a youtube URL. It can be any site that ytdl supports.config.txt
file.
source ./bin/activate
or .\scripts\activate.ps1
or .\scripts\activate.bat
from the tldw
directory)python ./summarize.py -gui
- This will launch a webapp that will allow you to interact with the script in a more user-friendly manner.
config.txt
file, or pass them in when you use the GUI.source ./bin/activate
or .\scripts\activate.ps1
or .\scripts\activate.bat
from the tldw
directory)--local_llm
to the script (python summarize.py --local-llm
), and it'll ask you if you want to download a model, and which one you'd like to download.python summarize.py https://example.com/video.mp4
python summarize.py /path/to/your/localfile.mp4
python summarize.py ./path/to/your/text_file.txt"
python summarize.py -gui --local_llama
Save time and use the config.txt
file, it allows you to set these settings and have them used when ran.
positional arguments:
input_path Path or URL of the video
options:
-h, --help show this help message and exit
-v, --video Download the video instead of just the audio
-api API_NAME, --api_name API_NAME
API name for summarization (optional)
-key API_KEY, --api_key API_KEY
API key for summarization (optional)
-ns NUM_SPEAKERS, --num_speakers NUM_SPEAKERS
Number of speakers (default: 2)
-wm WHISPER_MODEL, --whisper_model WHISPER_MODEL
Whisper model (default: small)| Options: tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large, distil-large-v2, distil-medium.en, distil-small.en, distil-large-v3
-off OFFSET, --offset OFFSET
Offset in seconds (default: 0)
-vad, --vad_filter Enable VAD filter
-log {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Log level (default: INFO)
-gui, --user_interface
Launch the Gradio user interface
-demo, --demo_mode Enable demo mode
-prompt CUSTOM_PROMPT, --custom_prompt CUSTOM_PROMPT
Pass in a custom prompt to be used in place of the existing one.
(Probably should just modify the script itself...)
-overwrite, --overwrite
Overwrite existing files
-roll, --rolling_summarization
Enable rolling summarization
-detail DETAIL_LEVEL, --detail_level DETAIL_LEVEL
Mandatory if rolling summarization is enabled, defines the chunk size.
Default is 0.01(lots of chunks) -> 1.00 (few chunks)
Currently only OpenAI works.
-model LLM_MODEL, --llm_model LLM_MODEL
Model to use for LLM summarization (only used for vLLM/TabbyAPI)
-k KEYWORDS [KEYWORDS ...], --keywords KEYWORDS [KEYWORDS ...]
Keywords for tagging the media, can use multiple separated by spaces (default: cli_ingest_no_tag)
--log_file LOG_FILE Where to save logfile (non-default)
--local_llm Use a local LLM from the script(Downloads llamafile from github and 'mistral-7b-instruct-v0.2.Q8' - 8GB model from Huggingface)
--server_mode Run in server mode (This exposes the GUI/Server to the network)
--share_public SHARE_PUBLIC
This will use Gradio's built-in ngrok tunneling to share the server publicly on the internet. Specify the port to use (default: 7860)
--port PORT Port to run the server on (default: 7860)
Sample commands:
1. Simple Sample command structure:
summarize.py <path_to_video> -api openai -k tag_one tag_two tag_three
2. Rolling Summary Sample command structure:
summarize.py <path_to_video> -api openai -prompt "custom_prompt_goes_here-is-appended-after-transcription" -roll -detail 0.01 -k tag_one tag_two tag_three
3. FULL Sample command structure:
summarize.py <path_to_video> -api openai -ns 2 -wm small.en -off 0 -vad -log INFO -prompt "custom_prompt" -overwrite -roll -detail 0.01 -k tag_one tag_two tag_three
4. Sample command structure for UI:
summarize.py -gui -log DEBUG
Download Audio only from URL -> Transcribe audio:
python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s
Transcribe audio from a Youtube URL & Summarize it using (anthropic/cohere/openai/llama (llama.cpp)/ooba (oobabooga/text-gen-webui)/kobold (kobold.cpp)/tabby (Tabbyapi)) API:
python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s -api
- Make sure to put your API key into
config.txt
under the appropriate API variable
Download Video with audio from URL -> Transcribe audio from Video:
python summarize.py -v https://www.youtube.com/watch?v=4nd1CDZP21s
Download Audio+Video from a list of videos in a text file (can be file paths or URLs) and have them all summarized:
python summarize.py --video ./local/file_on_your/system --api_name
Transcribe & Summarize a List of Videos on your local filesytem with a text file:
python summarize.py -v ./local/file_on_your/system
Run it as a WebApp:
`python summarize.py -gui
By default, videos, transcriptions and summaries are stored in a folder with the video's name under './Results', unless otherwise specified in the config file.
git clone https://github.com/ggerganov/llama.cpp
make
in the llama.cpp
folder ./server -m ../path/to/model -c <context_size>
git clone https://github.com/ggerganov/llama.cpp
llama.cpp
folder makein the
llama.cpp` folderserver.exe -m ..\path\to\model -c <context_size>
Double click KoboldCPP.exe and select model OR run "KoboldCPP.exe --help" in CMD prompt to get command line arguments for more control.
Generally you don't have to change much besides the Presets and GPU Layers. Run with CuBLAS or CLBlast for GPU acceleration.
Select your GGUF or GGML model you downloaded earlier, and connect to the displayed URL once it finishes loading.
On Linux, we provide a koboldcpp-linux-x64 PyInstaller prebuilt binary on the releases page for modern systems. Simply download and run the binary.
curl -fLo koboldcpp https://github.com/LostRuins/koboldcpp/releases/latest/download/koboldcpp-linux-x64 && chmod +x koboldcpp
./koboldcpp.sh dist
and run the generated binary.git clone https://github.com/oobabooga/text-generation-webui
Soruce code (zip)
file -> Extract -> Continue below.start_linux.sh
, start_windows.bat
, start_macos.sh
, or start_wsl.bat
script depending on your OS.summarize.py
- Main script for downloading, transcribing, and summarizing videos, audio files, books and documents.config.txt
- Config file used for settings for main app.requirements.txt
- Packages to install for Nvidia GPUsAMD_requirements.txt
- Packages to install for AMD GPUsllamafile
- Llama.cpp wrapper for local LLM inference, is multi-platform and multi-LLM compatible.prompts.db
- SQLite DB that stores all the prompts.App_Function_Libraries
Folder - Folder containing all of the applications function librariesTests
Folder - Folder containing tests for the application (ha.)Helper_Scripts
- Folder containing helper scripts for the applicationHF
- Docker file and requirements.txt for Huggingface Spaces hostingmodels
- Folder containing the models for the speaker diarization LLMstldw-original-scripts
- Original scripts from the original reposummarize.py
- download, transcribe and summarize audio
.m4a
file to .wav
.wav
file to .txt
chunker.py
- break text into parts and prepare each part for LLM summarizationroller-*.py
- rolling summarizationcompare.py
- prepare LLM outputs for webappcompare-app.py
- summary viewer webapp