Enhancement: Improve documentation regarding audio ingestion pipeline to inform the user when to use VAD

TomLucidor commented 5 days ago

(Just a bit of a personal note that could be included in the ReadMe)

Download Docker Desktop to see it run in the background
Download the following Docker file (assuming it has GPU) https://github.com/rmusser01/tldw/blob/main/Helper_Scripts/Dockerfiles/tldw-nvidia_amd64_Dockerfile
Save this file in its own folder, and change the file name to to Dockerfile and remove the .txt extension
Follow the commented instruction on top with the first 2 commands
Bob's ya uncle m8

TomLucidor commented 5 days ago

Currently checking if

There is not enough instructions on when to use "Speaker Diarization" (e.g. monologue vs dialogue) or "VAD" (e.g. packed video essay)
Whisper distil-large seems to bug out once and a while, would be good to have recommended model guides
Connecting to Ollama using Docker Tl;DW is hard since setting the model being used (e.g. Phi-3) is hard

rmusser01 commented 5 days ago

Thanks for the suggestions/feedback, responding in order,

The tldw-nvidia_amd64_Dockerfile is the same as the Dockerfile in the root of the repo(Should be, cursory glance says its mostly the same) minus comments). The .txt being added is a result of your local system, and not how its being stored. It's also my assumption that anyone who stumbles upon this project and is looking to get it working with Docker/is aware of Docker, already has it installed.
- That being said, the current documentation is very lacking and is on my to-do list, after I get more features put in, as those are higher priority for me right now.
Speaker Diarization/VAD: This pipeline needs to be tweaked/re-examined as the diarization is pretty bad. That and I believe it is unoptimized and as such, could do with some improvement. This was on my to-do list last week, but things got in the way and is still 'in waiting'.
Same with 2, pipeline needs some tweaking. Do agree on having a default model/recommendations. I personally use Large-v2 as I have found it to have the least issues with the current pipeline. large-v3 just misses the majority of speech, VAD or not.
I'm not sure what you mean here, as this is reliant on your personal setup. Are you referring to connecting the docker container to the outside(of docker) network so that ollama can successfully communicate with it?

TomLucidor commented 4 days ago

@rmusser01 thanks for the reply, the first numbered list is reminding myself how to install Docker with Docker, but I would love ya to address the bullet points instead (along hopefully with more documentations and guides). Especially now I would focus on how to get the Docker to recognize that I need Phi-3 SLM (and maybe an embedding model) from Ollama to do summarization. The pipeline tho definitely need more docs along with re-works since people might not have been explicit about their use case or video genre type... should I explain more?

rmusser01 commented 3 days ago

edit: Ah, I understand what you meant. I will continue to prioritize as I have, and will address the documentation as I get to it. Explaining how to use docker or ollama is not in the scope of this project. It is expected that a current user should know or understand how to use docker with custom dockerfiles or docker volumes for external storage if they are pursuing usage with docker. It is also assumed that if a user is attempting to use ollama, they should have no issues understanding how the API works so that they can ensure that network connectivity is successful between the two. I will edit this issue to reflect your suggestion of improving the documentation regarding the audio ingestion pipeline.

TomLucidor commented 2 days ago

@rmusser01 for using Docker and Ollama on its own, it is fairly easy, however I am facing some weird problem regarding setting which Ollama models to use in TL;DW since there is no such setting in the panel, and that they recommended fixing the config.txt file (which is packed within the file system of Docker, and is sightly less streamlined to use). I will keep testing features from this panel and others to see how an average user with moderate experience would face, and draft notes accordingly.

rmusser01 commented 14 hours ago

FYI you can edit the config file from the web UI @TomLucidor

rmusser01 / tldw

Enhancement: Improve documentation regarding audio ingestion pipeline to inform the user when to use VAD #431