mrseanryan / gpt-dm

Data modelling via natural language using an LLM. Outputs JSON or SQL. Also generates Test data in SQL or CSV format.
MIT License
2 stars 0 forks source link

Add feature to summarize a large schema #12

Open mrseanryan opened 1 month ago

mrseanryan commented 1 month ago

take a list of tables, with their properties

output high level summary:

use LLM or just an embedding like sbert? (word2vec) then cluster - or dillibert no case but need name the cluster

dot prod of 2 normalised vectors = cos Angle cosine distance = 1 - v.w smaller then closer