jjmaldonis / obsidian-audio-notes

Easily take notes on podcasts and other audio files using Obsidian Audio Notes.
MIT License
150 stars 2 forks source link

custom url for whisper api #8

Open 7596ff opened 1 year ago

7596ff commented 1 year ago

I run whisper in docker and I would like to automatically generate transcripts through the plugin myself instead of having to do so manually.

jjmaldonis commented 1 year ago

Great, can you share the docker build?

Is there an API available in the docker container that allows you to interact with whisper from outside the container?

7596ff commented 1 year ago

I just ran the commands in the readme: https://github.com/ahmetoner/whisper-asr-webservice

The API is pretty simple, you upload the file and tell it which format to return.

jjmaldonis commented 1 year ago

Ah I was hoping there was an existing docker image you used to run whisper. In order to run the commands within the plugin, it's necessary to create a REST API within the docker container to interact with Whisper, like running the model and getting the result. A downloadable docker image will need to be created to make it distributable. That's a good chunk of work and it'll be a while until I get to it. If you'd like to contribute that would be great too.

7596ff commented 1 year ago

If I'm reading you correctly, I don't think it's a good idea to automatically spin up a docker image from within obsidian. I think it would be best to require users to spin up what I linked themselves, which is quite easy with docker desktop. https://github.com/djmango/obsidian-transcription does this, but it's flaky from my testing. Not that I know anything about the internals of this plugin currently, but it seems moderate to run a command on an audio file that saves the json result next to it in-tree.

jjmaldonis commented 1 year ago

The plugin would never spin up a docker container automatically.

The overlap between the number of users who can spin up a docker container and use it, but who do not know how to install Whisper on their own machine is likely small, so the use case for using a docker container would be to support people who a) cannot install python and Whisper, b) can install docker and one image, and c) cannot interact with docker. So the workflow I would want to implement would be to support accessing docker via an API, which is necessary for the plugin anyway.

Writing an API will probably take 4 hours, and as you have seen from the other image it can be finicky. This is the majority of the work - create a container, preinstall Whisper, create a REST API, and publish the container. Hooking it up in the plugin will be straightforward after that. I likely won't get to this for a while.

7596ff commented 1 year ago

Thanks for that clarification. I'm one of those people who can't install python and whisper, because I can not figure out python's dependency management and environments and so on and so forth. I guess I also don't know the difference between a docker container and a docker image.

Thanks for considering this issue.