Open iamhenry opened 2 months ago
The idea is pretty cool but this would depend on ABS providing transcriptions. I have looked into whisper and whisper.cpp to transcribe audio files, but I have not found the time to implement anything yet. But would have to be added to ABS first, then transcriptions in the now playing view, and after that bookmark summaries.
I would also recommend opening an issue in the ABS repo for this feature, as this should probably be implemented server-side, too.
is that the only solution? is it possible to use an llm API via the cloud to generate it on the fly without having transcriptions?
looks like there's a discussion around it that's a bit stale due to lack of eng resources
someone does mention Snipd which is exactly what i was hoping we could have for ABS/ShelfPlayer
While it is possible to upload the audio file to a LLM provider like OpenAI and prompt it to generate a short summary it's really not ideal. I am pretty sure this gets expensive real fast if you upload large audio files, which is required to give the model enough context. Also I am not sure about the legal implications of this, e.g. if you are even allowed to upload copyrighted works.
I have looked into whisper
& whisper.cpp
, things that can be used to transcribe an item, and they work pretty well. While word synced transcripts are not really possible, extracting timestamped sentences works pretty well. But I could not find the time to implement anything in audiobookshelf yet.
Using something like https://github.com/jzhang38/TinyLlama would probably suffice to then create summaries, but this requires the transcripts to exist in the first place.
And including a open source multi modal model to do the transcripts locally is not really an option. The app is around 15MB right now, including even a small one would inflate that to at least 6GB.
i think someone in the ABS community will be attempting to solve this issue with an initial prototype
i've been tracking the convo here https://github.com/advplyr/audiobookshelf/issues/1723#issuecomment-2088749583
This will probably require a lot of work but for years i've been looking for an app that can take my bookmarks and create a summary from them
use case: take all my bookmarks from audio to text and transcibe them. max duration 60 secs to transcribe
Basically what Snipd podacst app is doing. It takes all my bookmarks from a podcast and generates a list and provides them as notes for me to review and dive deeper into that topic.
lmk what you think 😊