Reassign companies to their closest text cluster across all SIC codes using FAISS.
Steps:
Calculate text sector centroids based on the vector representations of their companies
Assign each company to its closest text sector
Recalculate text sector centroids
Assign each company
....
Until convergence
Some observations:
What distance do we use?
We would like to generate a confidence score for each company so that we can report results with various levels of confidence. Could this be based on the distribution of company distances to a text sector?
Reassign companies to their closest text cluster across all SIC codes using FAISS.
Steps:
Some observations: