rmusser01 / tldw

Too Long, Didn't Watch(TL/DW): Your Personal Research Multi-Tool - Open Source NotebookLM
Apache License 2.0
45 stars 2 forks source link

Improvement: Add SQLite DB [Done] #26

Closed rmusser01 closed 1 month ago

rmusser01 commented 1 month ago

As a user, I would like the ability to store a list of my videos transcribed, transcriptions, and summaries generated.

Further, I would like the ability to search my prior requests and review them.

Given the size of said data, maintaining personalized archives of it should be negligible in storage costs.

It will allow for record keeping of prior submissions, and comparisons against different LLMs for summarization efforts.

This would additionally lead to additional UI elements in the gradio UI for being able to view/compare prior generations.

Why sqlite: https://www.sqlite.org/whentouse.html

rmusser01 commented 1 month ago

Link dump for when I get around to this.

General DB: https://en.wikipedia.org/wiki/Database_normalization https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/

Gradio: https://www.gradio.app/guides/connecting-to-a-database https://www.gradio.app/docs/gradio/dataset

SQLite: https://www.sqlite.org/queryplanner.html https://www.sqlite.org/optoverview.html https://www.sqlite.org/queryplanner-ng.html https://docs.python.org/3/library/sqlite3.html

SQLite Vector Search: https://github.com/asg017/sqlite-vec https://alexgarcia.xyz/blog/2024/building-new-vector-search-sqlite/index.html https://news.ycombinator.com/item?id=40243168

SQLite DB Design: https://stackoverflow.com/questions/66293837/smart-way-to-structure-my-sqlite-database?rq=3 https://stackoverflow.com/questions/7235435/sqlite-structure-advice?rq=3 https://stackoverflow.com/questions/19368506/very-basic-sqlite-table-design?rq=3 https://stackoverflow.com/questions/7665735/how-do-i-organize-such-database-in-sqlite?rq=3 https://stackoverflow.com/questions/29055263/sql-database-layout-design?rq=3

Seems like it might be interesting/relevant later: https://docs.llamaindex.ai/en/stable/examples/index_structs/struct_indices/SQLIndexDemo/

rmusser01 commented 1 month ago

Seems relevant. Granted SQLite is not a GraphDB.

https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/ https://www.youtube.com/watch?v=r09tJfON6kE

rmusser01 commented 1 month ago

error handling, vacuuming, zipping transcriptions above X size, and external storage for documents and said zipped transcrriptions

rmusser01 commented 1 month ago

Done: SQLite implementation Ingestion of Video transcripts + metadata Adding of keywords to video records when ingesting

rmusser01 commented 1 month ago

Closing this issue, and opening new ones to track implementations of features.