Closed nicholas-leonard closed 10 years ago
We create a table mapping each word to its context words and their counts by unnesting the table of sentence arrays and performing a group by. We use these bags of words to generate similarity arrows. We use these arrows to cluster the words. We generate a new table of arrows without the within-cluster arrows from the previous one. We perform another clustering, and so forth and so on, until no arrows are left.
Finished primary word hierarchy.
Finished secondary word hierarchy.
Started tertiary word hierarchy.
The experts will share a multi-hierarchical softmax layer. Multiple non-overlapping trees will be generated before training using similarity graph clustering. No further clustering will be performed during training.