Train a sbm topic model on business descriptions by SIC sector and extract resulting clusters.
Some questions:
What is the minimum sector size for inclusion? (we had a min of 2000 companies in a SIC4 for the previous version)?
Do we assign companies to clusters exhaustively or do we only include the closest companies to a cluster? Latter likely to yield more informative clusters. We can classify unassigned companies into text sectors afterwards using the FAISS method.
Train a
sbm
topic model on business descriptions by SIC sector and extract resulting clusters.