rasmuslos / ShelfPlayer

Native Audiobookshelf player for iOS & iPadOS
Other
146 stars 15 forks source link

Request: Bookmark Summary #80

Open iamhenry opened 2 months ago

iamhenry commented 2 months ago

This will probably require a lot of work but for years i've been looking for an app that can take my bookmarks and create a summary from them

use case: take all my bookmarks from audio to text and transcibe them. max duration 60 secs to transcribe

Basically what Snipd podacst app is doing. It takes all my bookmarks from a podcast and generates a list and provides them as notes for me to review and dive deeper into that topic.

lmk what you think 😊

image

rasmuslos commented 2 months ago

The idea is pretty cool but this would depend on ABS providing transcriptions. I have looked into whisper and whisper.cpp to transcribe audio files, but I have not found the time to implement anything yet. But would have to be added to ABS first, then transcriptions in the now playing view, and after that bookmark summaries.

I would also recommend opening an issue in the ABS repo for this feature, as this should probably be implemented server-side, too.

iamhenry commented 2 months ago

is that the only solution? is it possible to use an llm API via the cloud to generate it on the fly without having transcriptions?

iamhenry commented 2 months ago

looks like there's a discussion around it that's a bit stale due to lack of eng resources

someone does mention Snipd which is exactly what i was hoping we could have for ABS/ShelfPlayer

https://github.com/advplyr/audiobookshelf/issues/1723

rasmuslos commented 2 months ago

While it is possible to upload the audio file to a LLM provider like OpenAI and prompt it to generate a short summary it's really not ideal. I am pretty sure this gets expensive real fast if you upload large audio files, which is required to give the model enough context. Also I am not sure about the legal implications of this, e.g. if you are even allowed to upload copyrighted works.

I have looked into whisper & whisper.cpp, things that can be used to transcribe an item, and they work pretty well. While word synced transcripts are not really possible, extracting timestamped sentences works pretty well. But I could not find the time to implement anything in audiobookshelf yet. Using something like https://github.com/jzhang38/TinyLlama would probably suffice to then create summaries, but this requires the transcripts to exist in the first place.

And including a open source multi modal model to do the transcripts locally is not really an option. The app is around 15MB right now, including even a small one would inflate that to at least 6GB.

iamhenry commented 2 months ago

i think someone in the ABS community will be attempting to solve this issue with an initial prototype

i've been tracking the convo here https://github.com/advplyr/audiobookshelf/issues/1723#issuecomment-2088749583