o-oconnell / mp4grep

mp4grep is a CLI for transcribing and searching audio/video files
GNU General Public License v3.0
281 stars 6 forks source link

Issue when installing #19

Closed JunkyardCat closed 2 years ago

JunkyardCat commented 2 years ago

im using debian 11 and ive downloaded 0.1.1 and extracted it. i cd into the folder and did source install.sh pls see below the output

readlink: invalid option -- 'b'
Try 'readlink --help' for more information.
Completed environment setup for mp4grep:
MP4GREP_CACHe=./.mp4grep_cache
MP4GREP_MODEL=./model
PATH=$PATH:./bin

Variable exported in ~/.bashrc

i tried running mp4grep and it says command not found

o-oconnell commented 2 years ago

Unfortunately the bare install.sh is not very portable (see https://github.com/o-oconnell/mp4grep/issues/15).

I would recommend setting your path, MP4GREP_CACHE, and MP4GREP_MODEL environment variables manually. Make sure to edit your .bashrc if the mangled shell-script already appended definitions there.

Commands to put in bashrc (or just run in your shell prior to mp4grep use): export MP4GREP_CACHE=[cache dir] export MP4GREP_MODEL=[model dir] export PATH=$PATH:[directory_where_mp4grep_binary_is]

o-oconnell commented 2 years ago

@JunkyardCat have you fixed this issue? Installation follows a different process with the latest version.

rrediske commented 2 years ago

I had trouble installing as well; make install complained that it couldn't find Parmap, so it had an error on line 3 of mp4grep.ml (where it tried: open Parmap).

Eventually I re-read the opam init message, which suggested that eval $(opam env) needed to run if the environment variables were not set by .bash_profile. I ran that and make install worked.

After that, mp4grep complained that MP4GREP_CACHE wasn't set, so I set it to ~/Templates/mp4cache/. Once that was satisfied, it complained that MP4GREP_MODEL wasn't set. I downloaded an English model from https://alphacephei.com/vosk/models, set MP4GREP_MODEL to that directory and it worked.

It's really slow on first run, even on an AMD 5900x, maybe in part because I chose an accurate VOSK model (very large 2.7 GB), but it's also only running on 1 of my 24 cores. It took 23 minutes to index a 60 minute meeting.

I had run opam switch create mp4grep 4.12.0+domains+effects after make install. It did some work without reporting any errors and it suggested to run eval $(opam env --switch=mp4grep) to update even more shell environment variables. I didn't trust that it installed correctly, so I ran opam switch remove mp4grep 4.12.0+domains+effects and then created it again and I was able to get down to 18 minutes to process the same file, but my CPU is still only averaging 5% utilization across all cores when processing a new file.

At this point, I gave up multicore support as I already spent 5 hours getting this far. I ran out of time to play and I only had a few strands of hair left to pull out.

rrediske commented 2 years ago

Not sure if this helps?

opam switch remove mp4grep 4.12.0+domains+effects Switch mp4grep and all its packages will be wiped. Are you sure? [Y/n] n The compiler switch 4.12.0+domains+effects does not exist.

opam switch create mp4grep 4.12.0+domains+effects [ERROR] There already is an installed switch named mp4grep

o-oconnell commented 2 years ago

I had trouble installing as well; make install complained that it couldn't find Parmap, so it had an error on line 3 of mp4grep.ml (where it tried: open Parmap).

Eventually I re-read the opam init message, which suggested that eval $(opam env) needed to run if the environment variables were not set by .bash_profile. I ran that and make install worked.

After that, mp4grep complained that MP4GREP_CACHE wasn't set, so I set it to ~/Templates/mp4cache/. Once that was satisfied, it complained that MP4GREP_MODEL wasn't set. I downloaded an English model from https://alphacephei.com/vosk/models, set MP4GREP_MODEL to that directory and it worked.

It's really slow on first run, even on an AMD 5900x, maybe in part because I chose an accurate VOSK model (very large 2.7 GB), but it's also only running on 1 of my 24 cores. It took 23 minutes to index a 60 minute meeting.

I had run opam switch create mp4grep 4.12.0+domains+effects after make install. It did some work without reporting any errors and it suggested to run eval $(opam env --switch=mp4grep) to update even more shell environment variables. I didn't trust that it installed correctly, so I ran opam switch remove mp4grep 4.12.0+domains+effects and then created it again and I was able to get down to 18 minutes to process the same file, but my CPU is still only averaging 5% utilization across all cores when processing a new file.

At this point, I gave up multicore support as I already spent 5 hours getting this far. I ran out of time to play and I only had a few strands of hair left to pull out.

The bottleneck for mp4grep is almost entirely the transcription process, which is a snippet of C code that runs Vosk. After the file has been transcribed, it will be much faster due to caching. Your experience with multicore is expected, since Vosk does not support multithreading the transcription of a single file - support of concurrency is provided for multiple files. Unfortunately this project is a wrapper around a fairly large dependency (which is a wrapper around its own fairly large dependency, Kaldi). We're improving the install process currently.