rasmuslos / ShelfPlayer

Native Audiobookshelf player for iOS & iPadOS
Other
208 stars 18 forks source link

Request: Bookmark Summary #80

Open iamhenry opened 7 months ago

iamhenry commented 7 months ago

This will probably require a lot of work but for years i've been looking for an app that can take my bookmarks and create a summary from them

use case: take all my bookmarks from audio to text and transcibe them. max duration 60 secs to transcribe

Basically what Snipd podacst app is doing. It takes all my bookmarks from a podcast and generates a list and provides them as notes for me to review and dive deeper into that topic.

lmk what you think 😊

image

rasmuslos commented 6 months ago

The idea is pretty cool but this would depend on ABS providing transcriptions. I have looked into whisper and whisper.cpp to transcribe audio files, but I have not found the time to implement anything yet. But would have to be added to ABS first, then transcriptions in the now playing view, and after that bookmark summaries.

I would also recommend opening an issue in the ABS repo for this feature, as this should probably be implemented server-side, too.

iamhenry commented 6 months ago

is that the only solution? is it possible to use an llm API via the cloud to generate it on the fly without having transcriptions?

iamhenry commented 6 months ago

looks like there's a discussion around it that's a bit stale due to lack of eng resources

someone does mention Snipd which is exactly what i was hoping we could have for ABS/ShelfPlayer

https://github.com/advplyr/audiobookshelf/issues/1723

rasmuslos commented 6 months ago

While it is possible to upload the audio file to a LLM provider like OpenAI and prompt it to generate a short summary it's really not ideal. I am pretty sure this gets expensive real fast if you upload large audio files, which is required to give the model enough context. Also I am not sure about the legal implications of this, e.g. if you are even allowed to upload copyrighted works.

I have looked into whisper & whisper.cpp, things that can be used to transcribe an item, and they work pretty well. While word synced transcripts are not really possible, extracting timestamped sentences works pretty well. But I could not find the time to implement anything in audiobookshelf yet. Using something like https://github.com/jzhang38/TinyLlama would probably suffice to then create summaries, but this requires the transcripts to exist in the first place.

And including a open source multi modal model to do the transcripts locally is not really an option. The app is around 15MB right now, including even a small one would inflate that to at least 6GB.

iamhenry commented 6 months ago

i think someone in the ABS community will be attempting to solve this issue with an initial prototype

i've been tracking the convo here https://github.com/advplyr/audiobookshelf/issues/1723#issuecomment-2088749583

iamhenry commented 3 months ago

snipd just released a huge update related to this. was curious to see what you thought and if you have any aspirations to add this feature? https://x.com/snipd_app/status/1811024587292864948

The feature allows me to upload any audio file and convert it to chapters/transcript while also having the ability to create highlights while autogenerating AI titles

i understand this is a huge task but no other app i have checked is even thinking about the enhancement and could be a game changer for this app

attaching a few screenshots of the highlights and generated chapters

5B6068AD-4FF9-49E4-8B96-33D45ABE3B9F 8EB04577-62C5-4089-B4CB-F00357254F3E B31A0417-0988-428D-AFC1-BECF08C07D70

rasmuslos commented 3 months ago

I think the actual features are easy enough to implement. Generating a transcript using whisper and then feeding it, together with a timestamp and a good prompt into an LLM like Llama is not that hard, the question is where do you run these AIs?

The sniped app is around 120 MB but I don't think the models are included (Whisper Base is around 140 MB, llama even bigger) so including the in the app binary is not possible. The memory consumption is also considerable (500 MB for whisper and multiple GB for llama). Sniped runs them on their servers, which is why they are charing you for a subscription, a business model not suitable for ShelfPlayer. The AI features would have to be implemented in ABS, where large binaries, huge memory consumption and long program runtimes are possible. I look into doing this but was so unfamiliar with the codebase that I didn't pull through. I may try again in the winter but until someone adds these features to ABS its not feasible to add them to ShelfPlayer.

iamhenry commented 3 months ago

thank you for looking into it.

I have been thinking about a practical solution for this. An idea i'd like to propose would be to enter our own api key to use openAI to do the transcribing for us. That way we dont need to rely on managing servers ourselves.

thoughts on this approach?

also, Groq seems to have Whisper Large at a fraction of the price. Perhaps that's a more budget friendly approach? https://wow.groq.com/

image