Closed rmusser01 closed 1 month ago
Link dump for when I get around to this.
General DB: https://en.wikipedia.org/wiki/Database_normalization https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/
Gradio: https://www.gradio.app/guides/connecting-to-a-database https://www.gradio.app/docs/gradio/dataset
SQLite: https://www.sqlite.org/queryplanner.html https://www.sqlite.org/optoverview.html https://www.sqlite.org/queryplanner-ng.html https://docs.python.org/3/library/sqlite3.html
SQLite Vector Search: https://github.com/asg017/sqlite-vec https://alexgarcia.xyz/blog/2024/building-new-vector-search-sqlite/index.html https://news.ycombinator.com/item?id=40243168
SQLite DB Design: https://stackoverflow.com/questions/66293837/smart-way-to-structure-my-sqlite-database?rq=3 https://stackoverflow.com/questions/7235435/sqlite-structure-advice?rq=3 https://stackoverflow.com/questions/19368506/very-basic-sqlite-table-design?rq=3 https://stackoverflow.com/questions/7665735/how-do-i-organize-such-database-in-sqlite?rq=3 https://stackoverflow.com/questions/29055263/sql-database-layout-design?rq=3
Seems like it might be interesting/relevant later: https://docs.llamaindex.ai/en/stable/examples/index_structs/struct_indices/SQLIndexDemo/
Seems relevant. Granted SQLite is not a GraphDB.
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/ https://www.youtube.com/watch?v=r09tJfON6kE
error handling, vacuuming, zipping transcriptions above X size, and external storage for documents and said zipped transcrriptions
Done: SQLite implementation Ingestion of Video transcripts + metadata Adding of keywords to video records when ingesting
Closing this issue, and opening new ones to track implementations of features.
As a user, I would like the ability to store a list of my videos transcribed, transcriptions, and summaries generated.
Further, I would like the ability to search my prior requests and review them.
Given the size of said data, maintaining personalized archives of it should be negligible in storage costs.
It will allow for record keeping of prior submissions, and comparisons against different LLMs for summarization efforts.
This would additionally lead to additional UI elements in the gradio UI for being able to view/compare prior generations.
Why sqlite: https://www.sqlite.org/whentouse.html