Open mrseanryan opened 1 month ago
take a list of tables, with their properties
output high level summary:
classify tables - but grouping via associations could be more important.
table summary
class description
higher level class of classes (top level category, then table category)
use LLM or just an embedding like sbert? (word2vec) then cluster - or dillibert no case but need name the cluster
dot prod of 2 normalised vectors = cos Angle cosine distance = 1 - v.w smaller then closer
user can add categories. different views (sets of categories).
stem entity names by their casing
take a list of tables, with their properties
output high level summary:
classify tables - but grouping via associations could be more important.
table summary
class description
higher level class of classes (top level category, then table category)
use LLM or just an embedding like sbert? (word2vec) then cluster - or dillibert no case but need name the cluster
dot prod of 2 normalised vectors = cos Angle cosine distance = 1 - v.w smaller then closer
user can add categories. different views (sets of categories).
stem entity names by their casing