Post-process text clusters - Githubissues

nestauk / industrial_taxonomy

Refactor of nestauk/industrial-taxonomy which upon completion will replace it.

MIT License

3 stars 0 forks source link

Post-process text clusters #20

Open Juan-Mateos opened 2 years ago

Juan-Mateos commented 2 years ago

The text clustering step described in #19 is likely to yield some noisy clusters which we would like to remove from analysis before the reassignment stage. We could explore some options to do this here:

Calculate silhouette scores for clusters and remove below a certain threshold (which?)
Identify salient terms and analyse their pairwise similarity using word2vec or something like that\
- This could lead to the removal of novel and crossover sectors.
Other...