kanjieater / SubPlz

🫴 Generate accurate subtitles from audio, align existing subs to videos, generate your own Kindle's Immersion Reading like audiobook subs 📖🎧
MIT License
83 stars 9 forks source link

Need some help setting it up #1

Closed Whitesttax closed 1 year ago

Whitesttax commented 1 year ago

I don't think this is an issue on your part, I just don't know how to run it.

I have researched and followed a guide for the following: How to create wsl. Followed a guide and installed python 3.9.9 successfully on the wsl with ubuntu. I'm confident I did this part right. Installed pip. Followed a guide to install ffmpeg on ubuntu, added the path like this: export PATH=$PATH:/bin/ffmpeg Installed stable-ts. Added it to path just in case with export PATH=$PATH:$HOME/pacote/ since I got the yellow message saying it was not in path. After this, step 4 sounds like it's optional, is it? I just want to run this, if it's optional I'd skip until I can get it to work. So I didn't do step 4. Then I tried to run with a single audiobook file, very small. The folder is called "name" and the audiobook file inside is "name.m4b", in the folder I also have "script.txt" with the ebook in it. When I run ./run.sh "$(wslpath -a "wsl.localhost\Ubuntu\home\pacote\name\")" (I tried coping your example) I only get new line ">" and a blinking _ (if I add another \ like your example after "name": name\ + another \")" it just gives me ~bash: ./run.sh: No such file or directory

Would you mind explaining what I'm missing?

edit: here's a picture of how it is just in case image

kanjieater commented 1 year ago

Yeah, I'm happy to help. Sorry its not clearer. You're the first person besides me to use this. So first off treat all the installs like they were inside Ubuntu. So you dont need to install ffmpeg on windows, but you do need to in Ubuntu. Then it should just be on the path.

Next, try running the split.sh command to get the m4b split into parts. Then you can run split_run which should run without any memory issues if your m4b got cut into chapters.

The run command you had should work too, if you dont feel like splitting and have enough ram to spare. Though I haven't been using it myself lately. Try converting your path to wsl, so something like "/mnt/c/documents/name" and using that instead of the wsl -a command, ehich only has the purpose of converting a windows path to a unix path.

I'm mobile right now, but if any of that doesn't make sense, let me know what you can try and what happens and I'll try to make clearer docs based on your pain points

Whitesttax commented 1 year ago

This is what you meant, right? I feel like I'm missing something basic about ubuntu. All the installs I did were by command line on ubuntu, not windows. So ffmpeg, python, pip, stable-ts, were all installed on ubuntu wsl. Again, I didn't do step 4.

image

I get this if I type stable-ts, not sure if it helps image

kanjieater commented 1 year ago

Ah I think I see your issue. image Your path is ./run.sh which means you are saying ./ is your current directory, and it has run.sh inside it. Since you showed your ls command, we can see you are not inside the repo!

Try running the command from inside this repo. so cd ~/thisProject/run.sh should be somewhere on your path.

kanjieater commented 1 year ago

I can clean up the docs to clarify this but you should download the project with git. git clone https://github.com/kanjieater/AudiobookTextSync.git Then cd AudiobookTextSync

You should then have a folder with this stuff in it when you ls image

Whitesttax commented 1 year ago

I see! I did that, here's how it looks now: image

That was done instantly so I think something else is missing. Also can't find that .vtt file.

kanjieater commented 1 year ago

Go ahead an git pull again. I've updated the readme. You should now be using ./run.sh for split or unsplit m4b's, and they should just work now. Let me know how it goes, and thanks for being patient with the troubleshooting.

In addition, there is now an anki command documented that can turn your m4b into an anki deck and import it for you.

Whitesttax commented 1 year ago

Hey, thanks for the answer!! I'm making progress now. I did the git pull again (had to delete the old folder first) and I was getting some errors at first, like: ./split.sh: line 5: cd: too many arguments. I then pasted the audiobook folder inside AudiobookTextSync and used the absolute path for ubuntu: powershell: \wsl.localhost\Ubuntu\home\pacote\AudiobookTextSync> wsl wslpath -a name result: /home/pacote/AudiobookTextSync/name Then it worked! Now I have the files split, but I'm getting an error when trying to ./run.sh: image

Edit: I tried changing the code from run.sh line 17-19-20 from "python" to "python3" just to see what happens, then I get this instead: image I don't know if that helps!

I also tried rerunning pip install stable-ts just in case that's what the error meant, but it still happens even after rerunning it.

kanjieater commented 1 year ago

Good to hear you are making progress! Try these things

  1. You need to be on python 3.9.9. python --version should say 3.9.9
  2. Then install your dependencies, stable-ts from pip as pip install git+https://github.com/jianfch/stable-ts.git
  3. pip install -r requirements.txt
  4. Don't change the script, get python on your path as python 3. (If step one is working nothing to do here) If you do all of those things it should be working 👍
Whitesttax commented 1 year ago

Did as you said and it's running, but it's like this for a while, is that common? There's is no .srt file on the folder yet so it's probably not finished, my PC is also still running hard. image

Also I saw the read me and it's much easier to understand now.

Edit: might've been my fault, here's what I got after the program ended: ./run.sh: line 17: 838 Killed python3 split_run.py "$FOLDER"

Forgot to change the python thing, will try a guide using an alias. Edit2: added python=python3 to bashrs, now "python" gives me python 3.9.9 but when I try to "./run.sh I still get the same, line 17: python: command not found image

kanjieater commented 1 year ago

I'm not sure which step you're at there that it was slow. Which step did it seem like. Here are what exists now:

1.1 (not pushed yet) Filter down audio to improve future results - slow & probably not heavy cpu or gpu usage. Heavier on cpu 1.2 split_run & stable-ts: Starts off heavy on CPU & RAM to identify the audio spectrum 1.3 stable-ts: GPU heavy & long part, where it tries to transcribe a text from the audio. THis is the long progressbar part 1.4 Merge vtt's for split subs 2.1 Split the script 2.2 match the script to the generated transcription to get good timestamps

Any idea where you were at in there?

Whitesttax commented 1 year ago

Since the progress bar was completed I'm guessing 1.4+ That's all I got at the end. My laptop was running really hard there, after the progress bar hit 100. image Here's the folder just in case: image image

kanjieater commented 1 year ago

Uh oh you ran out of memory! Unfortunately that's step 1.3. If it won't work for you there, I don't have a software solution. You could change the split-py from using large-v2 to medium model, but preformance of the model will be affected.

Whitesttax commented 1 year ago

I see! I'll try again while closing other apps and monitor it with task manager

kanjieater commented 1 year ago

image You can modify this line to any of the models in split_run.py. It may have get overwritten in future git pulls though. You can use any that whisper supports I think. Try a smaller one, but smaller means less training which means less accurate results.

The other solution would be to find a way to cut your files up even smaller. I do not have a solution for that though currently. You would have to make sure to not cut up anywhere besides silent parts, and ideally not between a sentence.

Whitesttax commented 1 year ago

I thought it was RAM not VRAM, so yeah it didn't work even with "tiny" image I guess I need to buy a GPU!

But really, thanks a lot for all the help, I learned a ton about ubuntu/python. I wanted to use this with jpdb's mpv plugin, which has color coded words based on my account's known words. It'd be amazing to mine as if it was anime, but for audiobooks.

kanjieater commented 1 year ago

I thought it was RAM not VRAM, so yeah it didn't work even with "tiny" image I guess I need to buy a GPU!

But really, thanks a lot for all the help, I learned a ton about ubuntu/python. I wanted to use this with jpdb's mpv plugin, which has color coded words based on my account's known words. It'd be amazing to mine as if it was anime, but for audiobooks.

Ah that's a good point - yes it's VRAM though the error from the OS doesn't really indicate that. For Japanese learning we might share some resources for this on TMW or check out my discord for more updates.

kanjieater commented 1 year ago

Just want to correct something I said above.

After running a few tests it seems that Medium tends to outperform large actually due to how stable-ts works. https://github.com/jianfch/stable-ts/issues/80#issuecomment-1442302091

Tiny also performs almost as well it seems to the point that I might leave that as the default.

If you have new questions or give it ago again in the future, feel free to open a new issue.