nestauk / industrial_taxonomy

Refactor of nestauk/industrial-taxonomy which upon completion will replace it.
MIT License
3 stars 0 forks source link

Reassign text clusters #21

Closed Juan-Mateos closed 2 years ago

Juan-Mateos commented 2 years ago

Reassign companies to their closest text cluster across all SIC codes using FAISS.

Steps:

  1. Calculate text sector centroids based on the vector representations of their companies
  2. Assign each company to its closest text sector
  3. Recalculate text sector centroids
  4. Assign each company .... Until convergence

Some observations: